ECDICT

ECDICT is a collection of structured linguistic datasets and an English-Chinese dictionary database. It provides bilingual word definitions, phonetic symbols, and parts of speech, alongside a bilingual geographic gazetteer that maps English place names to Chinese equivalents. These resources are available as a multi-format lexicon export in CSV, SQL, StarDict, and MDX formats.

The project distinguishes itself by integrating a linguistic corpus dataset that includes word frequency rankings and academic syllabus markers derived from national corpora. It functions as an educational vocabulary reference, tagging core words and professional terminology to align with academic exam requirements and learning levels.

The system supports bilingual database management and dictionary format conversion across various storage types. It includes capabilities for lexical lookup of phrasal verbs, slang, and idioms, as well as vocabulary analysis tools for core word identification and frequency annotation.

Search and indexing are handled through fuzzy word matching and lemma-based resolution to map inflected word forms to their base dictionary entries.

Features

Bilingual Dictionary Services - Provides English-Chinese translations and definitions for standard words, neologisms, and professional terminology.

English-Chinese Translation Resources - Provides a comprehensive database of English-Chinese bilingual word definitions, phonetic symbols, and parts of speech.

Frequency-Ordered Word Lists - Annotates dictionary entries with frequency rankings to distinguish contemporary usage from historical prevalence.

Simplified Chinese Dictionaries - Provides a comprehensive dictionary database of bilingual definitions and phonetic symbols for English and Chinese.

Frequency-Ordered Word Lists - Utilizes frequency rankings from national corpora and exam tags to determine word importance.

Frequency-Based Vocabularies - Filters dictionary entries using frequency data from national corpora to ensure high-utility coverage.

Vocabulary Extension Sets - Curates essential learning vocabulary through tagged lists of core words for students.

Content Metadata Tagging - Tags words with educational markers such as core status, star ratings, and syllabus requirements.

Educational Requirement Mapping - Links dictionary entries to standardized academic requirements to identify core vocabulary and exam-specific terms.

Linguistic Datasets - Provides structured linguistic datasets including word frequency rankings and academic syllabus markers.

High-Frequency Word Targeting - Provides pedagogical prioritization of high-frequency vocabulary lists based on national language corpora.

Foreign Language Learning - Supports language acquisition by identifying core vocabulary and aligning word lists with academic syllabi.

Academic Vocabulary Alignment - Cross-references word lists against standardized academic requirements to ensure essential terminology is included.

Educational Vocabulary References - Ships a tagged database of core words and professional terminology aligned with academic exam requirements.

Lemma Resolution Systems - Maps inflected word forms to base dictionary entries to ensure consistent lookups across grammatical variations.

Lexicon Exports - Exports linguistic data into multiple interchangeable formats including CSV, SQL, StarDict, and MDX.

Bilingual Data Management - Stores and manages bilingual dictionary entries across SQL and CSV formats for linguistic applications.

Bilingual Geographic Gazetteers - Includes a bilingual gazetteer mapping English place names and regional districts to Chinese equivalents.

Linguistic Dictionary Export - Exports linguistic data into multiple formats including StarDict, MDX, CSV, and SQL for external use.

Fuzzy Matching - Implements string normalization and fuzzy matching to resolve user input variations during dictionary lookups.

Geographic Gazetteers - Maps English place names and regional districts to their Chinese equivalents and corresponding countries.

Linguistic Database Management - Provides capabilities to query, register, update, and delete dictionary entries across CSV, SQLite, and MySQL.

Multi-Format Serializers - Converts internal database records into multiple interchangeable formats including CSV, SQL, StarDict, and MDX.

Relational Database Storage - Uses SQLite and MySQL to organize linguistic data and manage complex relationships between definitions and metadata.

Format Conversions - Implements the transformation of structured linguistic data between CSV, SQL, StarDict, and MDX formats.

Multi-Word Expression Indices - Provides specialized search and retrieval for phrasal verbs, idioms, and multi-word expressions.

Slang and Idiom Lexicons - Supplies definitions and explanations for urban slang, internet memes, and traditional proverbs.

Inflectional Dictionary Compilation - Implements lemma-based resolution to map inflected forms back to base dictionary entries.

skywind3000ECDICT

Features

Star history