Search language plugin

Plugin to detect the search language from the search query.

The language detection is done by using the fastText library (python fasttext). fastText distributes the language identification model, for reference:

The language identification model support the language codes (ISO-639-3):

af als am an ar arz as ast av az azb ba bar bcl be bg bh bn bo bpy br bs bxr
ca cbk ce ceb ckb co cs cv cy da de diq dsb dty dv el eml en eo es et eu fa
fi fr frr fy ga gd gl gn gom gu gv he hi hif hr hsb ht hu hy ia id ie ilo io
is it ja jbo jv ka kk km kn ko krc ku kv kw ky la lb lez li lmo lo lrc lt lv
mai mg mhr min mk ml mn mr mrj ms mt mwl my myv mzn nah nap nds ne new nl nn
no oc or os pa pam pfl pl pms pnb ps pt qu rm ro ru rue sa sah sc scn sco sd
sh si sk sl so sq sr su sv sw ta te tg th tk tl tr tt tyv ug uk ur uz vec vep
vi vls vo wa war wuu xal xmf yi yo yue zh

The language identification model is harmonized with the SearXNG’s language (locale) model. General conditions of SearXNG’s locale model are:

  1. SearXNG’s locale of a query is passed to the searx.locales.get_engine_locale to get a language and/or region code that is used by an engine.

  2. SearXNG and most of the engines do not support all the languages from language model and there might be also a discrepancy in the ISO-639-3 and ISO-639-2 handling (searx.locales.get_engine_locale). Further more, in SearXNG the locales like zh-TH (zh-CN) are mapped to zh_Hant (zh_Hans).

Conclusion: This plugin does only auto-detect the languages a user can select in the language menu (supported_langs).

SearXNG’s locale of a query comes from (highest wins):

  1. The Accept-Language header from user’s HTTP client.

  2. The user select a locale in the preferences.

  3. The user select a locale from the menu in the query form (e.g. :zh-TW)

  4. This plugin is activated in the preferences and the locale (only the language code / none region code) comes from the fastText’s language detection.

Conclusion: There is a conflict between the language selected by the user and the language from language detection of this plugin. For example, the user explicitly selects the German locale via the search syntax to search for a term that is identified as an English term (try :de-DE thermomix, for example).

Hint

To SearXNG maintainers; please take into account: under some circumstances the auto-detection of the language of this plugin could be detrimental to users expectations. Its not recommended to activate this plugin by default. It should always be the user’s decision whether to activate this plugin or not.

searx.plugins.autodetect_search_language.supported_langs = {'af', 'ar', 'be', 'bg', 'ca', 'cs', 'da', 'de', 'el', 'en', 'es', 'et', 'fa', 'fi', 'fil', 'fr', 'he', 'hi', 'hr', 'hu', 'id', 'is', 'it', 'ja', 'ko', 'lt', 'lv', 'nl', 'no', 'pl', 'pt', 'ro', 'ru', 'sk', 'sl', 'sr', 'sv', 'sw', 'th', 'tr', 'uk', 'vi', 'zh'}

Languages supported by most searxng engines (searx.languages.language_codes).