At first, I was skeptic. The first set of words could be more or less useful, the second could also be improved (though I am starting to study data analysis, hence some words do not make sense for me to use them right now because I don't know their meaning), but everything changed by the third set. The third is pratically perfect because you actually included a lot of very important words I didn't think when I gave my set of words to you. I don't know if you could make your algorithm improve with time, in the sense that, the student could select some words that are more important than others as they appear and the algorithm would select those as the new keywords to extend that wordlist towards an even more personalized wordlist, but this could also be a good feature.
More importantly, regarding the second set, this could turn things a little around and students could learn the meaning of some words in these courses and improve their knowledge on their own interests. Since most students come to lingvist to learn, I am sure that no one is coming just because of language if they could learn something else they like. So, it's a win-win, where they could learn language as well as something from their interests.
Regarding the last point, I agree that this is crucial for you. Continue to include translations because these words may be too different from the native language; as for synonyms, I am not sure you should work on that (and if you do, be careful), because at least for my sets I am afraid that would not make sense in the first set (although some words can have similar meanings, sometimes, words are not completely interchangeable and it can mislead (kind reminder: students still don't know the target language, so they will believe anything shown by you)); definitions would make your website a few levels above any other: students wouldn't be learning just a language, they would be learning two things they want to learn at the same time.
As a last remark, don't forget how you will show words to the student. Basic vocabulary and grammar continues to be more important than these sets. Words like the ones on my sets appear, let's say, every 1 in 1.000.000 words for most people. To me, it will probably be around 1/1.000. But words like "I", "you", "and" can appear as much as 1/10 or 1/100, which is still 10-100 times more often than my set of words. For instance, I cannot create a sentence with just "water", "training", "marathon" and "performance". For instance, to say: "Water intake as well as structured training will lead to a good marathon performance", I will need also the words "intake", "as", "well", "structured", "will", "lead", "to", "a" and "good". Personally, if it would be possible, I would even go a little further than what you have done so far. I would start by showing the student basic vocabulary and grammar for about 100/200 words. Then, maybe start with a few words from these sets. Lastly, start mixing the words: sometimes asking the learner for a word from these sets, other times asking for common word. And, here's the novelty, as the user goes by these sets and the common words from the target language, sentences could increase a little in length, requiring a little more from the student in terms of language comprehension, while asking for both "sets" of words (by sets here, I mean my sets the a set of the common words of the language).
Let me just finish by saying that what you are trying to accomplish is outstanding to say the least. Specially if done well. Thank you for letting me be part of this experiment (I look forward to be in others, by the way ) and good luck with this project.