Home
Search
Word List
Word Sketch
Thesaurus
Sketch Diff
Corpus info
My jobs
User Guide
Corpus :
Amharic WIC Corpus
Amharic Web 2013-17 (amWaC17)
Czech-Norwegian Parallel Corpus, Czech
Czech-Norwegian Parallel Corpus, Norwegian
Czech Web (czTenTen16) [2016, 2015]
Norwegian Web 2015 (Bokmål)
Norwegian Web 2015 (Nynorsk)
Oromo spoken (Text Laboratory, University of Oslo)
Oromo WaC [2016]
Somali WaC [2016]
Tigrinya WaC [2016]
Subcorpus:
None (whole corpus)
Web_2013
Web_2015
Web_2016
Web_2017
Web_Domain_.et_Ethiopia
Wikipedia_2017
Info
Create New
Search Attributes :
word
tag
word (lowercase)
collocations
Second level domain
Top level domain
Domain
doc.url
doc.length
doc.crawl_date
doc.lang_diff
doc.wordcount
p.heading
use n-grams. Value of n: from
2
3
4
5
To
2
3
4
5
hide/nest sub-n-grams
Filter options:
Filter word list by: Regular expression:
Minimum frequency:
Maximum frequency:
(0 = no maximum frequency)
Whitelist:
Clear
Blacklist:
Clear
Format
Include non-words
Output options:
Frequency figures:
Hit counts
Document counts
ARF
Output type:
Simple
Keywords
Reference (sub)corpus:
Amharic WIC Corpus
Amharic Web 2013-17 (amWaC17)
Czech-Norwegian Parallel Corpus, Czech
Czech-Norwegian Parallel Corpus, Norwegian
Czech Web (czTenTen16) [2016, 2015]
Norwegian Web 2015 (Bokmål)
Norwegian Web 2015 (Nynorsk)
Oromo spoken (Text Laboratory, University of Oslo)
Oromo WaC [2016]
Somali WaC [2016]
Tigrinya WaC [2016]
(whole corpus)
the rest of the corpus
Web_2013
Web_2015
Web_2016
Web_2017
Web_Domain_.et_Ethiopia
Wikipedia_2017
Change output attribute(s):
Select
Select
Select
Minimum frequency:
You can select one or more output attributes. Please note that this option can be time-consuming.
Make Word List