വിക്കിനിഘണ്ടു:ഏറ്റവും സാധാരണമായ വാക്കുകളുടെ പട്ടിക
ഇംഗ്ലീഷ്
[തിരുത്തുക]ഏറ്റവും സാധാരണമായ വാക്കുകൾ (ടി.വി., ചലച്ചിത്ര സംഭാഷണങ്ങൾ എന്നിവ അടിസ്ഥാനമാക്കി)
[തിരുത്തുക]Here are frequency lists comparable to the Gutenberg ones, but based on 29,213,800 words from TV and movie scripts and transcripts.
Here's a fuller explanation of how the list was generated and its limitations: Wiktionary:Frequency lists/TV/2006/explanation.
Here are the top 100 words (from tv scripts) in alphabetical order:
- a · about · all · and · are · as · at · back · be · because · been · but · can · can't · come · could · did · didn't · do · don't · for · from · get · go · going · good · got · had · have · he · her · here · he's · hey · him · his · how · I · if · I'll · I'm · in · is · it · it's · just · know · like · look · me · mean · my · no · not · now · of · oh · OK · okay · on · one · or · out · really · right · say · see · she · so · some · something · tell · that · that's · the · then · there · they · think · this · time · to · up · want · was · we · well · were · what · when · who · why · will · with · would · yeah · yes · you · your · you're
Here they are in frequency order:
- 1-1000 · 1001-2000 · 2001-3000 · 3001-4000 · 4001-5000 · 5001-6000 · 6001-7000 · 7001-8000 · 8001-9000 · 9001-10000
From the 10000th to the 40000th :
- 10001-12000 · 12001-14000 · 14001-16000 · 16001-18000 · 18001-20000 · 20001-22000 · 22001-24000 · 24001-26000 · 26001-28000 · 28001-30000 · 30001-32000 · 32001-34000 · 34001-36000 · 36001-38000 · 38001-40000
- 40001-41284 (the dregs that were tied for 40,000th place)
That'll probably be it. It's a third of all the unique words. The rest were used 5 or fewer times each.
ഏറ്റവും സാധാരണയായ വാക്കുകൾ (ഗുട്ടൻബർഗ്)
[തിരുത്തുക]These lists are the most frequent words, when performing a simple, straight (obvious) frequency count of all the books found on Project Gutenberg. The list of books was downloaded in July of 2005, and "rsync"'ed monthly thereafter. These are mostly English words, with some other languages finding representation to a lesser extent. Many Project Gutenberg books are scanned once their copyright expires, typically book editions published before 1923, so the language does not represent modern usage. For example, "hath" is listed as the 534th-most-common word. Also, with 24,000+ books, the text of the boilerplate warning for Project Gutenberg appears on each of them.
Here are the top 100 words (from Project Gutenberg texts) in alphabetical order:
- a · about · after · all · and · any · an · are · as · at · been · before · be · but · by · can · could · did · down · do · first · for · from · good · great · had · has · have · her · he · him · his · if · into · in · is · its · it · I · know · like · little · made · man · may · men · me · more · Mr · much · must · my · not · now · no · of · on · one · only · or · other · our · out · over · said · see · she · should · some · so · such · suffice ·than · that · the · their · them · then · there · these · they · this · time · to · two · upon · up · us · very · was · were · we · what · when · which · who · will · with · would · you · your
- These wikified terms can be copied to other language wiktionaries, this is what they are intended for. If you do, please add an interwiki link onto the page here.
- New list as of 4/16/2006:
- Wiktionary:Frequency lists/PG/2006/04/1-10000
- Wiktionary:Frequency lists/PG/2006/04/10001-20000
- Wiktionary:Frequency lists/PG/2006/04/20001-30000
- Wiktionary:Frequency lists/PG/2006/04/30001-40000
- New list as of 10/10/2005:
- The same list divided by thousand words:
- 1-1000 1001-2000 2001-3000 3001-4000 4001-5000 5001-6000 6001-7000 7001-8000 8001-9000 9001-10000
- more to come...
- Older lists
- Most common words, in order of rank:
- Wiktionary:Frequency lists/Project Gutenberg 1-10000
- Wiktionary:Frequency lists/Project Gutenberg 10001-20000
- Wiktionary:Frequency lists/Project Gutenberg 20001-30000
- Wiktionary:Frequency lists/Project Gutenberg 30001-40000
- Wiktionary:Frequency lists/Project Gutenberg 40001-50000
- Wiktionary:Frequency lists/Project Gutenberg 50001-60000
- Wiktionary:Frequency lists/Project Gutenberg 60001-70000
- Wiktionary:Frequency lists/Project Gutenberg 70001-80000
- Wiktionary:Frequency lists/Project Gutenberg 80001-90000
- Wiktionary:Frequency lists/Project Gutenberg 90001-100000
- Approximately 24,197 files, 1,712,082,956 words, 70,756.0 average words/file. from which were gleaned about 9,053,310 unique "words."
- From the straight frequency count, the current copy of Wiktionary was then removed from that list. Even entries that only have a redirect were removed.
- With somewhat different filtering/selection criteria:
- The latest version can always be found at:
Most common words in contemporary fiction
[തിരുത്തുക]The 2,000 most common words in contemporary fiction can be found here:
The 2,000 most common words in contemporary fiction can be found here divided into 60 subject categories.
This lumps regular lemmas of the same word together, unlike most of these lists.
Top English words lists
[തിരുത്തുക]- Category:100 English basic words
- Category:200 English basic words
- Category:1000 English basic words
- Complete Shakespeare wordlist | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
British National Corpus: most frequent Word Families
[തിരുത്തുക]See the list on Simple English Wiktionary.
Academic Word List by word family
[തിരുത്തുക]See the list on Simple English Wiktionary.
Other languages
[തിരുത്തുക]The thirteen most popular Dutch words
[തിരുത്തുക]from Max Havelaar (numbers between parentheses denote occurrences):
- de (4770)
- en (2709)
- het / 't (2469)
- van (2259)
- ik (1999)
- te (1935)
- dat (1875)
- die (1807)
- in (1639)
- een (1637)
- hij (1328)
- niet (1162)
- zijn (1049)
French words
[തിരുത്തുക]Frequency lists from http://wortschatz.uni-leipzig.de/html/wliste.html with the authorization from the laboratory.
- top 2000 words
- Wiktionary:French frequency lists/2001-4000
- Wiktionary:French frequency lists/4001-6000
- Wiktionary:French frequency lists/6001-8000
- Wiktionary:French frequency lists/8001-10000
The most used Galician words
[തിരുത്തുക]From the works of Ana Bugarín presented in the ILG Lexicographical Symposium 2006 (basical form):
- o (defined article)
- e (conjunction)
- de (preposition)
- do (contraction)
- que (relator)
- ser (verb)
- un (undefined article)
- no (contraction)
- se (reflexive pronoun/conditional conjunction)
- non (negation adverb)
- ó (contraction)
- a (preposition)
- en (preposition)
- ter (verb)
- para (preposition)
- por (preposition)
From the CORGA corpus (lexical items):
The rest until the 1000th position is available at CORGA corpus.
German Wikipedia words
[തിരുത്തുക]Hungarian words
[തിരുത്തുക]Top 100.000 words in Hungarian text: http://mokk.bme.hu/resources/webcorpus
Hungarian frequency list 1-10000
Italian frequency list
[തിരുത്തുക]Top 1000 Italian words from subtitles:
Top 200 Korean words
[തിരുത്തുക]Spanish frequency lists
[തിരുത്തുക]Top 10000 Spanish words from subtitles:
Swedish frequency lists
[തിരുത്തുക]- Wiktionary:Frequency lists/top 2000 Swedish Wikipedia words
- /Swedish (similar, but not identical)
Thai lists
[തിരുത്തുക]- If this is just "basic" words, not statistically the "most frequent" words, it shouldn't be here, it should be in the Appendix namespace only. --Connel MacKenzie 20:59, 26 December 2006 (UTC)