Startpage Engines

Startpage’s language & region selectors are a mess ..

Startpage regions

In the list of regions there are tags we need to map to common region tags:

pt-BR_BR --> pt_BR
zh-CN_CN --> zh_Hans_CN
zh-TW_TW --> zh_Hant_TW
zh-TW_HK --> zh_Hant_HK
en-GB_GB --> en_GB

and there is at least one tag with a three letter language tag (ISO 639-2):

fil_PH --> fil_PH

The locale code no_NO from Startpage does not exists and is mapped to nb-NO:

babel.core.UnknownLocaleError: unknown locale 'no_NO'

For reference see languages-subtag at iana; no is the macrolanguage [1] and W3C recommends subtag over macrolanguage [2].

Startpage languages

send_accept_language_header:

The displayed name in Startpage’s settings page depend on the location of the IP when Accept-Language HTTP header is unset. In fetch_traits we use:

'Accept-Language': "en-US,en;q=0.5",
..

to get uniform names independent from the IP).

Startpage categories

Startpage’s category (for Web-search, News, Videos, ..) is set by startpage_categ in settings.yml:

- name: startpage
  engine: startpage
  startpage_categ: web
  ...

Hint

The default category is web .. and other categories than web are not yet implemented.

searx.engines.startpage.fetch_traits(engine_traits: EngineTraits)[source]

Fetch languages and regions from Startpage.

searx.engines.startpage.get_sc_code(searxng_locale, params)[source]

Get an actual sc argument from Startpage’s search form (HTML page).

Startpage puts a sc argument on every HTML search form. Without this argument Startpage considers the request is from a bot. We do not know what is encoded in the value of the sc argument, but it seems to be a kind of a time-stamp.

Startpage’s search form generates a new sc-code on each request. This function scrap a new sc-code from Startpage’s home page every sc_code_cache_sec seconds.

searx.engines.startpage.request(query, params)[source]

Assemble a Startpage request.

To avoid CAPTCHA we need to send a well formed HTTP POST request with a cookie. We need to form a request that is identical to the request build by Startpage’s search form:

  • in the cookie the region is selected

  • in the HTTP POST data the language is selected

Additionally the arguments form Startpage’s search form needs to be set in HTML POST data / compare <input> elements: search_form_xpath.

searx.engines.startpage.max_page = 18

Tested 18 pages maximum (argument page), to be save max is set to 20.

searx.engines.startpage.sc_code_cache_sec = 30

Time in seconds the sc-code is cached in memory get_sc_code.

searx.engines.startpage.search_form_xpath = '//form[@id="search"]'

XPath of Startpage’s origin search form

searx.engines.startpage.send_accept_language_header = True

Startpage tries to guess user’s language and territory from the HTTP Accept-Language. Optional the user can select a search-language (can be different to the UI language) and a region filter.

searx.engines.startpage.startpage_categ = 'web'

Startpage’s category, visit Startpage categories.