2 个仓库
Handling language-specific tokenization, stemming, and normalization for search indexing across different languages.
Distinct from Language Variant Support: None of the candidates cover general natural language processing for search; they focus on programming language syntax or infrastructure SDKs.
Explore 2 awesome GitHub repositories matching data & databases · Multilingual Text Processing. Refine with filters or upvote what's useful.
lunr.js is a JavaScript full-text search library and client-side search engine. It creates in-memory search indexes for fast keyword retrieval and ranked document matching within browser or Node.js environments. The library utilizes a JSON serializable search index, allowing the search structure to be converted to and from JSON for storage and distribution of pre-built search data. This enables search functionality for static websites by indexing content into portable files. The system supports advanced querying capabilities, including fuzzy text matching to account for typos, field-scoped i
Provides specialized processing for different languages to handle stemming and normalization during indexing and search.
这是一个全文搜索引擎和企业搜索基础设施,专为索引和检索大型文档集而设计。它提供了一个使用排名结果和语言分析进行信息发现的综合框架。 该系统将高维向量相似度搜索与传统的全文检索功能相结合,用于语义检索。它通过支持地理空间数据检索、多语言文本处理以及包含容错查询补全和拼写检查的搜索建议工作流而脱颖而出。 该平台涵盖了广泛的搜索和索引功能,包括复杂查询执行、分面计数聚合和结果分组。它通过分词和归一化处理文本分析,同时提供用于文档连接、搜索命中高亮以及基于时效性和距离的自定义评分的专业工具。 提供了一个 Python 搜索接口,用于向外部编程环境公开索引和查询功能。
Handles language-specific tokenization, stemming, and normalization to ensure accurate search results across different languages.