ECDICT is a collection of structured linguistic datasets and an English-Chinese dictionary database. It provides bilingual word definitions, phonetic symbols, and parts of speech, alongside a bilingual geographic gazetteer that maps English place names to Chinese equivalents. These resources are available as a multi-format lexicon export in CSV, SQL, StarDict, and MDX formats.
The project distinguishes itself by integrating a linguistic corpus dataset that includes word frequency rankings and academic syllabus markers derived from national corpora. It functions as an educational vocabulary reference, tagging core words and professional terminology to align with academic exam requirements and learning levels.
The system supports bilingual database management and dictionary format conversion across various storage types. It includes capabilities for lexical lookup of phrasal verbs, slang, and idioms, as well as vocabulary analysis tools for core word identification and frequency annotation.
Search and indexing are handled through fuzzy word matching and lemma-based resolution to map inflected word forms to their base dictionary entries.