MediaWiki Extension CirrusSearch and Elasticsearch Setup

From WikiMLT
Revision as of 15:50, 30 August 2022 by Spas (talk | contribs) (→‎Check)

This is a short man­u­al how to set-up Elas­tic­search to be used with the MediaWiki's ex­ten­sion Cir­rusSearch which com­mu­ni­cate to the ser­vice by the ex­ten­sion Elas­ti­ca. You should choice an ap­pro­pri­ate Elas­tic­search ver­sion de­pend­ing on your Me­di­aWi­ki ver­sion. Cur­rent­ly I'm us­ing Me­di­aWi­ki 1.38 and it is rec­om­mend­ed to use Elas­tic­search 6.8.23+ with it. This ver­sion runs well over open­jdk-11 which is the de­fault Ja­va ver­sion on Ubun­tu Serv­er 22.04.

Elas­tic­search and the ex­ten­sion Elas­ti­ca are re­quired by some oth­er Me­di­aWi­ki ex­ten­sions as ex­ten­sion Trans­late where it is used as trans­la­tion mem­o­ry. It is al­so used by the NextCoud's ap­pli­ca­tion Full text search and more…

Ja­va Set­up

On Ubun­tu Serv­er the de­fault jdk and jre pack­ages can be in­stalled by the fol­low­ing com­mand.

sudo apt install -y apt-transport-https default-jdk default-jre

To check and switch the cur­rent ver­sion of Ja­va and Javac you can use the fol­low­ing com­mands.

sudo update-alternatives --config java
#Out­put
There are 2 choices for the alternative java (providing /usr/bin/java).

  Selection    Path                                            Priority   Status
------------------------------------------------------------
  0            /usr/lib/jvm/java-11-openjdk-amd64/bin/java      1111      auto mode
* 1            /usr/lib/jvm/java-11-openjdk-amd64/bin/java      1111      manual mode
  2            /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java   1081      manual mode

Press <enter> to keep the current choice[*], or type selection number: 1
sudo update-alternatives --config javac
#Out­put
There are 2 choices for the alternative javac (providing /usr/bin/javac).

  Selection    Path                                          Priority   Status
------------------------------------------------------------
  0            /usr/lib/jvm/java-11-openjdk-amd64/bin/javac   1111      auto mode
* 1            /usr/lib/jvm/java-11-openjdk-amd64/bin/javac   1111      manual mode
  2            /usr/lib/jvm/java-8-openjdk-amd64/bin/javac    1081      manual mode

Press <enter> to keep the current choice[*], or type selection number: 1

If you are us­ing Elas­tic­search 5.x it re­quires openjdk‑8 which can be in­stalled by the fol­low­ing com­mands. Af­ter the in­stal­la­tion use the above com­mands to switch the ver­sion in use.

#De­tails
sudo apt install openjdk-8-jre-headless 
sudo apt install openjdk-8-jdk-headless

Af­ter switch­ing the ver­sion of Ja­va you need to restart the Elas­tic­search ser­vice if it is al­ready in­stalled.

sudo systemctl restart elasticsearch.service 
curl 'http://127.0.0.1:9200' # do a test

Elas­tic­search Set­up

In­stal­la­tion

There is a cou­ple of ways how to In­stalling Elas­tic­search – via Dock­er, via Apt repos­i­to­ry, via .deb or .rpm pack­ages, etc. I pre­fer to man­u­al­ly down­load and in­stall it via .deb pack­age. Is I said be­fore for Me­di­aWi­ki 1.38 we need ver­sion 6.8.23+.

cd ~/Downloads
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.8.23.deb
sudo apt install ./elasticsearch-6.8.23.deb
#Ver­sions
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.16.deb
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.5.4.deb
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.0.1-amd64.deb

Af­ter in­stalling the pack­age the Elas­tic­search ser­vice must be en­abled and start­ed.

sudo systemctl enable --now elasticsearch.service   # enable and start the service
systemctl status elasticsearch.service              # check the status of the service
systemctl cat elasticsearch.service                 # check the current service's configuration

Check

You can check does the ser­vice work prop­er­ly by the fol­low­ing ap­proach.

curl 'http://127.0.0.1:9200'
#Out­put
{
  "name" : "W2uxKNc",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "JwkNoPi_THuiCA123-HKMg",
  "version" : {
    "number" : "6.8.23",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "4f67856",
    "build_date" : "2022-01-06T21:30:50.087716Z",
    "build_snapshot" : false,
    "lucene_version" : "7.7.3",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

More de­tailed in­for­ma­tion can be ob­tained by the next com­mand.

curl -XGET 'http://localhost:9200/_nodes?pretty'
#Out­put
{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "elasticsearch",
  "nodes" : {
    "W2uxKNc9SQqZSVN4RIZmNg" : {
      "name" : "W2uxKNc",
      "transport_address" : "127.0.0.1:9300",
      "host" : "127.0.0.1",
      "ip" : "127.0.0.1",
      "version" : "6.8.23",
      "build_flavor" : "default",
      "build_type" : "deb",
      "build_hash" : "4f67856",
      "total_indexing_buffer" : 418159001,
      "roles" : [
        "master",
        "data",
        "ingest"
      ],
      "attributes" : {
        "ml.machine_memory" : "24456527872",
        "xpack.installed" : "true",
        "ml.max_open_jobs" : "20",
        "ml.enabled" : "true"
      },
      "settings" : {
        "pidfile" : "/var/run/elasticsearch/elasticsearch.pid",
        "cluster" : {
          "name" : "elasticsearch"
        },
        "node" : {
          "attr" : {
            "xpack" : {
              "installed" : "true"
            },
            "ml" : {
              "machine_memory" : "24456527872",
              "max_open_jobs" : "20",
              "enabled" : "true"
            }
          },
          "name" : "W2uxKNc"
        },
        "path" : {
          "data" : [
            "/var/lib/elasticsearch"
          ],
          "logs" : "/var/log/elasticsearch",
          "home" : "/usr/share/elasticsearch"
        },
        "client" : {
          "type" : "node"
        },
        "http" : {
          "type" : "security4",
          "type.default" : "netty4"
        },
        "transport" : {
          "type" : "security4",
          "features" : {
            "x-pack" : "true"
          },
          "type.default" : "netty4"
        }
      },
      "os" : {
        "refresh_interval_in_millis" : 1000,
        "name" : "Linux",
        "pretty_name" : "Ubuntu 22.04.1 LTS",
        "arch" : "amd64",
        "version" : "5.15.0-46-generic",
        "available_processors" : 16,
        "allocated_processors" : 16
      },
      "process" : {
        "refresh_interval_in_millis" : 1000,
        "id" : 1041,
        "mlockall" : false
      },
      "jvm" : {
        "pid" : 1041,
        "version" : "11.0.16",
        "vm_name" : "OpenJDK 64-Bit Server VM",
        "vm_version" : "11.0.16+8-post-Ubuntu-0ubuntu122.04",
        "vm_vendor" : "Ubuntu",
        "start_time_in_millis" : 1661769755777,
        "mem" : {
          "heap_init_in_bytes" : 4294967296,
          "heap_max_in_bytes" : 4181590016,
          "non_heap_init_in_bytes" : 7667712,
          "non_heap_max_in_bytes" : 0,
          "direct_max_in_bytes" : 0
        },
        "gc_collectors" : [
          "ParNew",
          "ConcurrentMarkSweep"
        ],
        "memory_pools" : [
          "CodeHeap 'non-nmethods'",
          "Metaspace",
          "CodeHeap 'profiled nmethods'",
          "Compressed Class Space",
          "Par Eden Space",
          "Par Survivor Space",
          "CodeHeap 'non-profiled nmethods'",
          "CMS Old Gen"
        ],
        "using_compressed_ordinary_object_pointers" : "true",
        "input_arguments" : [
          "-Xms4g",
          "-Xmx4g",
          "-XX:+UseConcMarkSweepGC",
          "-XX:CMSInitiatingOccupancyFraction=75",
          "-XX:+UseCMSInitiatingOccupancyOnly",
          "-Des.networkaddress.cache.ttl=60",
          "-Des.networkaddress.cache.negative.ttl=10",
          "-XX:+AlwaysPreTouch",
          "-Xss1m",
          "-Djava.awt.headless=true",
          "-Dfile.encoding=UTF-8",
          "-Djna.nosys=true",
          "-XX:-OmitStackTraceInFastThrow",
          "-Dio.netty.noUnsafe=true",
          "-Dio.netty.noKeySetOptimization=true",
          "-Dio.netty.recycler.maxCapacityPerThread=0",
          "-Dlog4j.shutdownHookEnabled=false",
          "-Dlog4j2.disable.jmx=true",
          "-Dlog4j2.formatMsgNoLookups=true",
          "-Djava.io.tmpdir=/tmp/elasticsearch-14060835447651286248",
          "-XX:+HeapDumpOnOutOfMemoryError",
          "-XX:HeapDumpPath=/var/lib/elasticsearch",
          "-XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log",
          "-Djava.locale.providers=COMPAT",
          "-XX:UseAVX=2",
          "-Des.path.home=/usr/share/elasticsearch",
          "-Des.path.conf=/etc/elasticsearch",
          "-Des.distribution.flavor=default",
          "-Des.distribution.type=deb"
        ]
      },
      "thread_pool" : {
        "watcher" : {
          "type" : "fixed",
          "min" : 50,
          "max" : 50,
          "queue_size" : 1000
        },
        "force_merge" : {
          "type" : "fixed",
          "min" : 1,
          "max" : 1,
          "queue_size" : -1
        },
        "security-token-key" : {
          "type" : "fixed",
          "min" : 1,
          "max" : 1,
          "queue_size" : 1000
        },
        "ml_datafeed" : {
          "type" : "fixed",
          "min" : 20,
          "max" : 20,
          "queue_size" : 200
        },
        "fetch_shard_started" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 32,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "listener" : {
          "type" : "fixed",
          "min" : 8,
          "max" : 8,
          "queue_size" : -1
        },
        "ml_autodetect" : {
          "type" : "fixed",
          "min" : 80,
          "max" : 80,
          "queue_size" : 80
        },
        "index" : {
          "type" : "fixed",
          "min" : 16,
          "max" : 16,
          "queue_size" : 200
        },
        "refresh" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 8,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "generic" : {
          "type" : "scaling",
          "min" : 4,
          "max" : 128,
          "keep_alive" : "30s",
          "queue_size" : -1
        },
        "rollup_indexing" : {
          "type" : "fixed",
          "min" : 4,
          "max" : 4,
          "queue_size" : 4
        },
        "warmer" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 5,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "search" : {
          "type" : "fixed_auto_queue_size",
          "min" : 25,
          "max" : 25,
          "queue_size" : 1000
        },
        "ccr" : {
          "type" : "fixed",
          "min" : 32,
          "max" : 32,
          "queue_size" : 100
        },
        "flush" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 5,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "fetch_shard_store" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 32,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "management" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 5,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "ml_utility" : {
          "type" : "fixed",
          "min" : 80,
          "max" : 80,
          "queue_size" : 500
        },
        "get" : {
          "type" : "fixed",
          "min" : 16,
          "max" : 16,
          "queue_size" : 1000
        },
        "analyze" : {
          "type" : "fixed",
          "min" : 1,
          "max" : 1,
          "queue_size" : 16
        },
        "write" : {
          "type" : "fixed",
          "min" : 16,
          "max" : 16,
          "queue_size" : 200
        },
        "snapshot" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 5,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "search_throttled" : {
          "type" : "fixed_auto_queue_size",
          "min" : 1,
          "max" : 1,
          "queue_size" : 100
        }
      },
      "transport" : {
        "bound_address" : [
          "[::1]:9300",
          "127.0.0.1:9300"
        ],
        "publish_address" : "127.0.0.1:9300",
        "profiles" : { }
      },
      "http" : {
        "bound_address" : [
          "[::1]:9200",
          "127.0.0.1:9200"
        ],
        "publish_address" : "127.0.0.1:9200",
        "max_content_length_in_bytes" : 104857600
      },
      "plugins" : [ ],
      "modules" : [
        {
          "name" : "aggs-matrix-stats",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Adds aggregations whose input are a list of numeric fields and output includes a matrix.",
          "classname" : "org.elasticsearch.search.aggregations.matrix.MatrixAggregationPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "analysis-common",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Adds \"built in\" analyzers to Elasticsearch.",
          "classname" : "org.elasticsearch.analysis.common.CommonAnalysisPlugin",
          "extended_plugins" : [
            "lang-painless"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "ingest-common",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Module for ingest processors that do not require additional security permissions or have large dependencies and resources",
          "classname" : "org.elasticsearch.ingest.common.IngestCommonPlugin",
          "extended_plugins" : [
            "lang-painless"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "ingest-geoip",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Ingest processor that uses looksup geo data based on ip adresses using the Maxmind geo database",
          "classname" : "org.elasticsearch.ingest.geoip.IngestGeoIpPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "ingest-user-agent",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Ingest processor that extracts information from a user agent",
          "classname" : "org.elasticsearch.ingest.useragent.IngestUserAgentPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "lang-expression",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Lucene expressions integration for Elasticsearch",
          "classname" : "org.elasticsearch.script.expression.ExpressionPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "lang-mustache",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Mustache scripting integration for Elasticsearch",
          "classname" : "org.elasticsearch.script.mustache.MustachePlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "lang-painless",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "An easy, safe and fast scripting language for Elasticsearch",
          "classname" : "org.elasticsearch.painless.PainlessPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "mapper-extras",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Adds advanced field mappers",
          "classname" : "org.elasticsearch.index.mapper.MapperExtrasPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "parent-join",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "This module adds the support parent-child queries and aggregations",
          "classname" : "org.elasticsearch.join.ParentJoinPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "percolator",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Percolator module adds capability to index queries and query these queries by specifying documents",
          "classname" : "org.elasticsearch.percolator.PercolatorPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "rank-eval",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "The Rank Eval module adds APIs to evaluate ranking quality.",
          "classname" : "org.elasticsearch.index.rankeval.RankEvalPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "reindex",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "The Reindex module adds APIs to reindex from one index to another or update documents in place.",
          "classname" : "org.elasticsearch.index.reindex.ReindexPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "repository-url",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Module for URL repository",
          "classname" : "org.elasticsearch.plugin.repository.url.URLRepositoryPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "transport-netty4",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Netty 4 based transport implementation",
          "classname" : "org.elasticsearch.transport.Netty4Plugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "tribe",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Tribe module",
          "classname" : "org.elasticsearch.tribe.TribePlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-ccr",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - CCR",
          "classname" : "org.elasticsearch.xpack.ccr.Ccr",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-core",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Core",
          "classname" : "org.elasticsearch.xpack.core.XPackPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-deprecation",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Deprecation",
          "classname" : "org.elasticsearch.xpack.deprecation.Deprecation",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-graph",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Graph",
          "classname" : "org.elasticsearch.xpack.graph.Graph",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-ilm",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Index Lifecycle Management",
          "classname" : "org.elasticsearch.xpack.indexlifecycle.IndexLifecycle",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-logstash",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Logstash",
          "classname" : "org.elasticsearch.xpack.logstash.Logstash",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-ml",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Machine Learning",
          "classname" : "org.elasticsearch.xpack.ml.MachineLearning",
          "extended_plugins" : [
            "x-pack-core",
            "lang-painless"
          ],
          "has_native_controller" : true
        },
        {
          "name" : "x-pack-monitoring",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Monitoring",
          "classname" : "org.elasticsearch.xpack.monitoring.Monitoring",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-rollup",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Rollup",
          "classname" : "org.elasticsearch.xpack.rollup.Rollup",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-security",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Security",
          "classname" : "org.elasticsearch.xpack.security.Security",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-sql",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "The Elasticsearch plugin that powers SQL for Elasticsearch",
          "classname" : "org.elasticsearch.xpack.sql.plugin.SqlPlugin",
          "extended_plugins" : [
            "x-pack-core",
            "lang-painless"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-upgrade",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Upgrade",
          "classname" : "org.elasticsearch.xpack.upgrade.Upgrade",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-watcher",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Watcher",
          "classname" : "org.elasticsearch.xpack.watcher.Watcher",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        }
      ],
      "ingest" : {
        "processors" : [
          {
            "type" : "append"
          },
          {
            "type" : "bytes"
          },
          {
            "type" : "convert"
          },
          {
            "type" : "date"
          },
          {
            "type" : "date_index_name"
          },
          {
            "type" : "dissect"
          },
          {
            "type" : "dot_expander"
          },
          {
            "type" : "drop"
          },
          {
            "type" : "fail"
          },
          {
            "type" : "foreach"
          },
          {
            "type" : "geoip"
          },
          {
            "type" : "grok"
          },
          {
            "type" : "gsub"
          },
          {
            "type" : "join"
          },
          {
            "type" : "json"
          },
          {
            "type" : "kv"
          },
          {
            "type" : "lowercase"
          },
          {
            "type" : "pipeline"
          },
          {
            "type" : "remove"
          },
          {
            "type" : "rename"
          },
          {
            "type" : "script"
          },
          {
            "type" : "set"
          },
          {
            "type" : "set_security_user"
          },
          {
            "type" : "sort"
          },
          {
            "type" : "split"
          },
          {
            "type" : "trim"
          },
          {
            "type" : "uppercase"
          },
          {
            "type" : "urldecode"
          },
          {
            "type" : "user_agent"
          }
        ]
      }
    }
  }
}

Tweaks

Elas­tic­search could use huge amount of RAM. But, I've test­ed it for thin in­stances it work even with on­ly 128m. The main con­fig­u­ra­tion files are lo­cat­ed in­to the di­rec­to­ry /​​​etc/​​​elasticsearch/​​​. You can tweak the amount of Ram in use by tweak­ing the rel­e­vant lines in the file jvm.options. Note Xms and Xmx val­ues must be equal.

sudo nano /etc/elasticsearch/jvm.options
# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space

#-Xms512m
#-Xmx512m
-Xms4g
-Xmx4g

Add restart al­ways di­rec­tive to the Elasticsearch's sys­temd unit.

sudo systemctl edit elasticsearch.service
[Service]
# SZS/MLT Tweak
Restart=always
RestartSec=3

To ap­ply the changes use the fol­low­ing com­mands.

sudo systemctl daemon-reload
sudo systemctl restart elasticsearch.service
systemctl status elasticsearch.service
systemctl cat elasticsearch.service

Me­di­aWi­ki Set­up

The main pur­pose of this guide is how to set­up Elas­tic­search to be used by MediaWiki's ex­ten­sion Cir­rusSearch, so in this sec­tion we will de­scribe how to do that. In ad­di­tion al­so the ex­ten­sion Ad­vanced­Search will be in­stalled and con­fig­ured.

If you have in­stalled the ex­ten­sion PdfHan­dler (or some oth­er file han­dling ex­ten­sion) Cir­rusSearch will show re­sults from the files con­tent – in the con­fig­u­ra­tion be­low is shown how to boost these re­sults. How to con­fig­ure ex­ten­sion Trans­late to use Elas­tic­search is de­cried in the MediaWiki's doc­u­men­ta­tion in the ar­ti­cle Trans­la­tion mem­o­ries.

In­stall the Ex­ten­sions

First of all you need to in­stall the ex­ten­sions with­in the MediaWiki's doc­u­ment root. In the fol­low­ing ex­am­ple is used the ap­proach Down­load from Git.

: ${IP:="/var/www/wiki.example.com"} # The DocumentRoot directory of the wiki
: ${OWNER:="www-data"}               # The user that owns the $IP directory
: ${BRANCH:="REL1_38"}               # The MediaWiki's branch in use
cd "$IP/extensions"
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/AdvancedSearch --branch ${BRANCH}
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/Elastica --branch ${BRANCH}
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/CirrusSearch --branch ${BRANCH}
sudo chown -R ${Owner}:${Owner} Elastica/ CirrusSearch/
for ext in Elastica CirrusSearch; do sudo -u ${Owner} composer install --no-dev; done

LocalSettings.php Con­fig­u­ra­tion

Open the con­fig­u­ra­tion file with your fa­vorite ed­i­tor and place the fol­low­ing lines at suit­able place (the end of the file is good place). In the ex­am­ple be­low is shown the cur­rent con­fig­u­ra­tion of this wi­ki. Af­ter the build­ing of the search in­dex (next sec­tion) Cir­rusSearch should work with­out the ad­vanced set­up. More op­tions are de­scribed in the Extension:CirrusSearch page, al­so some un­doc­u­ment­ed op­tions could be found with­in its extension.json file.

sudo nano "$IP/LocalSettings.php"
## Extension:AdvancedSearch
wfLoadExtension( 'AdvancedSearch' );
$wgAdvancedSearchDeepcatEnabled = false;    // https://www.mediawiki.org/wiki/Topic:Uw036nwsilvb6w3t
$wgAdvancedSearchBetaFeature = false;       // (enable it by default) https://m.mediawiki.org/wiki/Topic:Upflskaswcvrunka
$wgAdvancedSearchHighlighting = true;       // https://www.mediawiki.org/wiki/Manual:Configuration_settings_(alphabetical)
$wgOpenSearchDescriptionLength = 2500;      // https://www.mediawiki.org/wiki/Manual:$wgOpenSearchDescriptionLength

## Extension:Elastica
wfLoadExtension( 'Elastica' );

## Extension:CirrusSearch
wfLoadExtension( 'CirrusSearch' );
// $wgDisableSearchUpdate = true;
$wgSearchType = 'CirrusSearch';
$wgDebugLogGroups['CirrusSearch'] = "$IP/cache/CirrusSearch.log";
// $wgCirrusSearchIndexBaseName = 'wiki_db_name';       // https://www.mediawiki.org/wiki/Extension:CirrusSearch#Configuration
// $wgCirrusSearchServers = [ '10.120.201.1' ];         // The address of the Elasticsearch serer if it is not available at 'localhost'

## Extension:CirrusSearch Advanced Setup
$wgCirrusSearchRescoreProfile = 'classic_noboostlinks';
// $wgCirrusSearchFullTextQueryBuilderProfiles = 'perfield_builder';
// $wgCirrusSearchCompletionProfiles = 'normal';
$wgCirrusSearchPhraseSuggestUseText = true;
$wgCirrusSearchCompletionSuggesterHardLimit = 200; // 50
$wgCirrusSearchFragmentSize = 200;
// $wgCirrusExploreSimilarResults = true;

// Give much weight to the "file_text" in order to show 
// results from the PDFs content. This requires PdfHandler
$wgCirrusSearchWeights = [
    "title" => 20,
    "redirect" => 15,
    "category" => 8,
    "heading" => 5,
    "opening_text" => 3,
    "text" => 5,
    "auxiliary_text" => 15,
    "file_text" => 25
];

// https://www.mediawiki.org/wiki/Help:Namespaces#Localisation
$wgCirrusSearchNamespaceWeights = [
    "2" => 0.05,
    "4" => 0.3,
    "6" => 0.2,
    "8" => 0.05,
    "10" => 0.005,
    "12" => 0.2,
    "14" => 0.1
];

Build Search In­dex

How to build and up­date the CirrusSearch/​​​Elasticsearc in­dex is well de­scribed in the doc­u­ments README and UP­GRADE which comes with the ex­ten­sion. Here the im­por­tant steps re­lat­ed to the in­dex build­ing for a first time are ex­tract­ed and con­vert­ed to a script.

mlw-cirrussearch-elasticsearch-build-index-single-wiki.sh
#!/bin/bash

# @author    Spas Z. Spasov <spas.z.spasov@metalevel.tech>
# @copyright 2022 Spas Z. Spasov
# @license   https://www.gnu.org/licenses/gpl-3.0.html GNU General Public License, version 3 (or later)
#
# @desc      Create elastic search index for a singel MediaWiki
#
# @install   Create executable file within /usr/local/bin/ and place this code as its content
# @source    https://phabricator.wikimedia.org/source/extension-cirrussearch/browse/master/README

: ${IP:="/var/www/wiki.example.com"} # The DocumentRoot directory of the wiki
: ${OWNER:="www-data"}               # The user that owns the $IP directory

# STEP 0
printf -- '\n**\nDisable Cirrus Search for %s ------------\n*\n\n' "${IP##*/}"
sudo -u "$OWNER" sed -i 's#^$wgSearchType#// $wgSearchType#' $IP/LocalSettings.php
sudo -u "$OWNER" sed -i 's#^// $wgDisableSearchUpdate#$wgDisableSearchUpdate#' $IP/LocalSettings.php
echo -e "\n\n**\n$IP/LocalSettings.php\n*\n"
sudo -u "$OWNER" grep '$wgSearchType\|$wgDisableSearchUpdate = true' $IP/LocalSettings.php
echo
sleep 5

# STEP 1
printf -- '\n**\nGenerate ElasticSearch Index for %s ------------\n*\n\n' "${IP##*/}"
sudo -u "$OWNER" /usr/bin/php $IP/extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --startOver --conf $IP/LocalSettings.php
echo
sleep 5

# STEP 2
sudo -u "$OWNER" sed -i 's#^$wgDisableSearchUpdate#// $wgDisableSearchUpdate#' $IP/LocalSettings.php
sudo -u "$OWNER" grep '$wgSearchType\|$wgDisableSearchUpdate = true' $IP/LocalSettings.php
echo
sleep 5
printf -- '\n**\n Bootstrap the Search Index for %s ------------\n*\n\n' "${IP##*/}"
sudo -u "$OWNER" /usr/bin/php $IP/extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipLinks --indexOnSkip --conf $IP/LocalSettings.php
sudo -u "$OWNER" /usr/bin/php $IP/extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipParse --conf $IP/LocalSettings.php

# STEP 3
sleep 5
printf -- '\n**\nEnable Cirrus Search for %s ------------\n*\n\n' "${IP##*/}"
sudo -u "$OWNER" sed -i 's#^// $wgSearchType#$wgSearchType#' $IP/LocalSettings.php
echo -e '\n\n**\n$IP/LocalSettings.php\n*\n'
sudo -u "$OWNER" grep '$wgSearchType\|$wgDisableSearchUpdate = true' $IP/LocalSettings.php
echo

# Step 4
sleep 5
printf -- '\n**\nUpdate Cirrus Search Suggestions for %s ------------\n*\n\n' "${IP##*/}"
sudo -u "$OWNER" /usr/bin/php $IP/extensions/CirrusSearch/maintenance/UpdateSuggesterIndex.php --conf $IP/LocalSettings.php

За да за­поч­не ре­гу­ляр­но ин­дек­си­ра­не на съ­дър­жа­ни­е­то на уики­то, спря­мо кон­фи­гу­ра­ци­я­та, напра­ве­на в /var/­www/­*/­Local­Sett­ings.php и до­ку­мен­та­ци­я­та на mw:Extension­:­CirrusSearch тряб­ва да напра­ви пър­во­на­чал­на ин­дек­са­ция, да се из­пъл­нят за­да­чи­те, ко­и­то ще съз­да­де тя, да се ре­ге­не­ри­ра ин­дек­са на съ­дър­жа­ни­е­то и от­но­во да се из­праз­ни опаш­ка­та със за­да­чи­те. За цел­та мо­гат да се из­пол­з­ват скрип­то­ве­те за под­дръж­ка, опи­са­ни в сек­ци­я­та Me­di­aWi­ki.

mw-maintenance-elasticsearch-index.sh
mw-maintenance-runJobs.sh cli
mw-maintenance-rebuildAll.sh
mw-maintenance-runJobs.sh cli

Ad­di­tion­al Set­up

Ac­cess Elas­tic­search via SSH Tun­nel

Us­ing such ap­proach is suit­able on­ly for test pur­pose, here is a man­u­al how to set-up:

Elas­tic­search Watch Scripts

В до­пъл­не­ние е раз­ра­бо­тен скрип­та elasticsearch​-watch​.sh, ка­то чрез crontab за­да­ча се пра­ви пе­ри­о­дич­на про­вер­ка и при не­об­хо­ди­мост рес­тар­ти­ра­не. Скрип­та из­пра­ща пис­мо до vectoria@altclavis.com, ако настъ­пи съ­би­тие.

sudo crontab -e
# ElasticSearch Watch
*/5 * * * * /usr/local/bin/elasticsearch-watch.sh

Ref­er­ences