Word cloud produced using Wordle (SydTV-Std, untagged, w/out common English words)
On this page I have made available for download different kinds of frequency lists so that other researchers can compare their own data with SydTV and SydTV-Std (a partially standardized version of SydTV). Alternatively, you can use this online interface to undertake frequency analyses of the corpora. To help with calculations of normalized frequencies, Table 1 shows the corpus size depending on different token definitions.
Frequency lists produced with Wordsmith (Version 7):
Hyphens do not separate words; ‘ not allowed within word
Hyphens separate words; ‘ not allowed within word
Hyphens do not separate words; ‘ allowed within word
Hyphens separate words; ‘ allowed within word
Frequency lists produced using AntConc (Version 3.44) – text files
Token definition settings: letter
Token definition settings: letter; append following definition ‘
Frequency lists produced using Sketch Engine – CSV files
- Words, lemmas and lempos (SydTV-Std; default settings; minimum frequency changed to 1)
- 2-grams to 6-grams (SydTV-Std; default settings; minimum frequency changed to 3)
Additional frequency lists, based on a more limited dataset, can be accessed on my website, which also includes discussion of selected items. There is much overlap between these lists, for example in relation to the trigram out of here, sometimes occurring in the utterance Let’s get … out of here, which is often cited as the most cliched line or stock phrase in cinema (see montage below).