Questions tagged [computational-linguistics]
A branch of science that uses computers and mathematical methods to construct and investigate linguistic theory. Its technological and algorithmic implementation is called NLP.
564 questions
1
vote
3
answers
188
views
Languages with Nearly Uniform Character Frequencies
I am a statistician working on a curriculum with a chapter on randomness. To illustrate some concepts of randomness (namely Shannon entropy and MDL), I set up a hypothetical scenario with deciphering ...
4
votes
1
answer
145
views
Measure of efficiency of a language?
Is there a measure of a language's efficiency?
Such as a ratio:
(information content) ÷ (phonemes to express that information)
I'm not asking about spoken language specifically (as in "Speech ...
-4
votes
2
answers
94
views
Does the apparent creativity of AI, like GPT, challenge Chomsky's claim that only humans possess a uniquely creative capacity for language?
Chomsky argued that the creative use of language is a uniquely human ability, one that machines inherently lack. In Cartesian Linguistics, he built on Descartes' notion that human language, due to its ...
1
vote
1
answer
93
views
Is something similar to orthogonality defined in linguistics?
In mathematics, two vectors are orthogonal to each other if you cannot produce one from the other using linear operations (there is a more precise definition, but this is the simplest).
For example, &...
0
votes
1
answer
85
views
Is there an error-free web-based parser that automatically draws a syntax tree from an English text?
The only web-based automatic parser I know is CoreNLP version 4.5.5, where you can put in an English text and get a constituency tree (when you select 'parts-of-speech' as Annotations and click '...
0
votes
0
answers
88
views
Insight into basic machine translation error from major email service
I recently ordered a product through a Japanese company (I live in Tokyo), and received an email response that due to a recent backlog of orders I should expect my product shipped within 2-3 weeks (in ...
2
votes
1
answer
85
views
Information rate of ultra-information-sparse languages
Pellegrino 2011
Also Pellegrino
YoonMi
These popular researches found out information density (ID) inversely correlates with speech rate (SR) to make information rate (IR) of languages cluster around ...
3
votes
0
answers
39
views
List of counter examples + statistics of Greenberg's universal
I could not find a list of counter examples/ statistics of Greenberg's linguistic universals.
There are numbers that I could find relevant information on WALS. There are some I could not find anything....
0
votes
0
answers
74
views
What language has a small difference between word length of advanced VS basic vocabulary?
word length on Swadesh list
Zipf's Law
Basic vocabulary is used more frequently and the word length (#syllables/#segments) is generally shorter than advanced/technical/formal vocabulary.
In English:
...
2
votes
1
answer
201
views
Did big languages generally have a net loss of inflectional morphology in the past 1-3 millennia and small languages the other way round?
a.
R. M. W. Dixon (1998) theorizes that languages normally evolve in a cycle from fusional to analytic to agglutinative to fusional again like a clock. There are two opposing forces: one reduces ...
1
vote
0
answers
27
views
Specifically which Corpora were used by the Ofsted Research Team in designing the new curricula for MFL GCSE 2026?
Specifically which corpora were used by the Ofsted Research Team in designing the new curricula for MFL GCSE 2026?
0
votes
0
answers
97
views
What is the information density of factual knowledge in large bodies of English text?
An ML paper I was reading mentioned an estimate of no more than 0.7 bits per word, in footnote 4:
As of February 1, 2024, English Wikipedia contains a total of 4.5 billion words [...] We estimate ...
0
votes
0
answers
62
views
What is the most accurate way to parse a text so that we can get the characters and the list of sentences that refer to each character?
I'm trying to come up with a method that will take a text and parse it so that we can get all the characters and a list of the sentences from the text that have references to the character (either ...
-5
votes
1
answer
250
views
How has computational linguistics contributed to the preservation of endangered languages?
Computational tools and techniques had been applied to the field of historical linguistics, aiding in the analysis of old or endangered languages. This has contributed to the documentation and ...
0
votes
0
answers
71
views
how much text data an AI chatbot is based on vs how accurate its language use is
This question is motivated by a question I read on another online forum, to which the answerers said that when they tested ChatGPT's Hindi, it made grammatical errors all the time and was also trash ...