The Test of Chinese as a Foreign Language (TOCFL) (Chinese: 華語文能力測驗; pinyin: Huáyǔwén Nénglì Cèyàn) is a standardized test of Taiwanese Mandarin language proficiency for non-native speakers, including foreign students. While there are many vocabulary lists available online, a lot of them are either incomplete / outdated or behind paywalls.
This repo provides a dataset based on (linked from the official TOCFL website):
coct.naer.edu.tw/download/tech_report
Taiwan Chinese Language Proficiency Benchmark Vocabulary List_111-11-14.xlsx
The vocabulary list is great, it gives frequency for written AND spoken. It also provides pinyin to differentiate same char with different meaning pronounciation.
Taiwan Chinese Language Proficiency Benchmark Chinese Character List_111-09-20.xlsx
https://github.com/tomcumming/tocfl-word-list also provides TOCFL lists, but seems to be incomplete (or outdated). The source used to compile the list is not entirely clear.