This reminds me of translation algorithms on Star Trek - the problem is that human language has too many rules, and even more exceptions, to handle without a large sample to generate a large rulebook. I don't know what the minimum data size would be for a project like this, however having known some linguists, it has to be not only large but also diverse in specific ways to capture a usable picture of the language.
2 comments:
This reminds me of translation algorithms on Star Trek - the problem is that human language has too many rules, and even more exceptions, to handle without a large sample to generate a large rulebook.
I don't know what the minimum data size would be for a project like this, however having known some linguists, it has to be not only large but also diverse in specific ways to capture a usable picture of the language.
As I recall, that was the 'big reveal' in A Canticle for Lieberwitz.
Post a Comment