MediaWiki Extension CirrusSearch and Elasticsearch Setup

From WikiMLT

This is a short man­u­al how to set-up Elas­tic­search to be used with the MediaWiki's ex­ten­sion Cir­rusSearch which com­mu­ni­cate to the ser­vice by the ex­ten­sion Elas­ti­ca. You should choice an ap­pro­pri­ate Elas­tic­search ver­sion de­pend­ing on your Me­di­aWi­ki ver­sion. Cur­rent­ly I'm us­ing Me­di­aWi­ki 1.39 and it is rec­om­mend­ed to use Elas­tic­search 7.10.2 with it. This ver­sion runs well over open­jdk-11 which is the de­fault Ja­va ver­sion on Ubun­tu Serv­er 22.04.

Elas­tic­search and the ex­ten­sion Elas­ti­ca are re­quired by some oth­er Me­di­aWi­ki ex­ten­sions as ex­ten­sion Trans­late where it is used as trans­la­tion mem­o­ry. It is al­so used by the NextCoud's ap­pli­ca­tion Full text search and more…

See al­so Me­di­aWi­ki Job Queue.

Ja­va Set­up

On Ubun­tu Serv­er the de­fault jdk and jre pack­ages can be in­stalled by the fol­low­ing com­mand.

sudo apt install -y apt-transport-https default-jdk default-jre

To check and switch the cur­rent ver­sion of Ja­va and Javac you can use the fol­low­ing com­mands.

sudo update-alternatives --config java
#Out­put
There are 2 choices for the alternative java (providing /usr/bin/java).

  Selection    Path                                            Priority   Status
------------------------------------------------------------
  0            /usr/lib/jvm/java-11-openjdk-amd64/bin/java      1111      auto mode
* 1            /usr/lib/jvm/java-11-openjdk-amd64/bin/java      1111      manual mode
  2            /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java   1081      manual mode

Press <enter> to keep the current choice[*], or type selection number: 1
sudo update-alternatives --config javac
#Out­put
There are 2 choices for the alternative javac (providing /usr/bin/javac).

  Selection    Path                                          Priority   Status
------------------------------------------------------------
  0            /usr/lib/jvm/java-11-openjdk-amd64/bin/javac   1111      auto mode
* 1            /usr/lib/jvm/java-11-openjdk-amd64/bin/javac   1111      manual mode
  2            /usr/lib/jvm/java-8-openjdk-amd64/bin/javac    1081      manual mode

Press <enter> to keep the current choice[*], or type selection number: 1

If you are us­ing Elas­tic­search 5.x it re­quires openjdk‑8 which can be in­stalled by the fol­low­ing com­mands. Af­ter the in­stal­la­tion use the above com­mands to switch the ver­sion in use.

#De­tails
sudo apt install openjdk-8-jre-headless 
sudo apt install openjdk-8-jdk-headless

Af­ter switch­ing the ver­sion of Ja­va you need to restart the Elas­tic­search ser­vice if it is al­ready in­stalled.

sudo systemctl restart elasticsearch.service 
curl 'http://127.0.0.1:9200' # do a test

Elas­tic­search Set­up

In­stal­la­tion

There is a cou­ple of ways how to In­stalling Elas­tic­search – via Dock­er, via Apt repos­i­to­ry, via .deb or .rpm pack­ages, etc. I pre­fer to man­u­al­ly down­load and in­stall it via .deb pack­age. As we said be­fore for Me­di­aWi­ki 1.39+ we need Elas­tic­search ver­sion 7.10.2.

cd ~/Downloads
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.10.2-amd64.deb
sudo apt install ./elasticsearch-7.10.2-amd64.deb
#Ver­sions
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.16.deb
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.5.4.deb
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.8.23.deb
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.0.1-amd64.deb

Af­ter in­stalling the pack­age the Elas­tic­search ser­vice must be en­abled and start­ed.

sudo systemctl enable --now elasticsearch.service   # enable and start the service
systemctl status elasticsearch.service              # check the status of the service
systemctl cat elasticsearch.service                 # check the current service's configuration

Check

You can check does the ser­vice work prop­er­ly by the fol­low­ing ap­proach.

curl 'http://127.0.0.1:9200'
#Out­put
{
  "name" : "metalevel.tech",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "znG-mCHAQU6L3oVR9UIthg",
  "version" : {
    "number" : "7.10.2",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "747e1cc71def077253878a59143c1f785afa92b9",
    "build_date" : "2021-01-13T00:42:12.435326Z",
    "build_snapshot" : false,
    "lucene_version" : "8.7.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

More de­tailed in­for­ma­tion can be ob­tained by the next com­mand.

curl -XGET 'http://localhost:9200/_nodes?pretty'
#Out­put
{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "elasticsearch",
  "nodes" : {
    "BNfFzNWMTF20Xd5nlcwt6w" : {
      "name" : "metalevel.tech",
      "transport_address" : "127.0.0.1:9300",
      "host" : "127.0.0.1",
      "ip" : "127.0.0.1",
      "version" : "7.10.2",
      "build_flavor" : "default",
      "build_type" : "deb",
      "build_hash" : "747e1cc71def077253878a59143c1f785afa92b9",
      "total_indexing_buffer" : 214748364,
      "roles" : [
        "data",
        "data_cold",
        "data_content",
        "data_hot",
        "data_warm",
        "ingest",
        "master",
        "ml",
        "remote_cluster_client",
        "transform"
      ],
      "attributes" : {
        "ml.machine_memory" : "25112887296",
        "xpack.installed" : "true",
        "transform.node" : "true",
        "ml.max_open_jobs" : "20"
      },
      "settings" : {
        "client" : {
          "type" : "node"
        },
        "cluster" : {
          "name" : "elasticsearch",
          "election" : {
            "strategy" : "supports_voting_only"
          }
        },
        "http" : {
          "type" : "security4",
          "type.default" : "netty4"
        },
        "node" : {
          "attr" : {
            "transform" : {
              "node" : "true"
            },
            "xpack" : {
              "installed" : "true"
            },
            "ml" : {
              "machine_memory" : "25112887296",
              "max_open_jobs" : "20"
            }
          },
          "name" : "metalevel.tech",
          "pidfile" : "/var/run/elasticsearch/elasticsearch.pid"
        },
        "path" : {
          "data" : [
            "/var/lib/elasticsearch"
          ],
          "logs" : "/var/log/elasticsearch",
          "home" : "/usr/share/elasticsearch"
        },
        "transport" : {
          "type" : "security4",
          "features" : {
            "x-pack" : "true"
          },
          "type.default" : "netty4"
        }
      },
      "os" : {
        "refresh_interval_in_millis" : 1000,
        "name" : "Linux",
        "pretty_name" : "Ubuntu 22.04.2 LTS",
        "arch" : "amd64",
        "version" : "5.15.0-67-generic",
        "available_processors" : 16,
        "allocated_processors" : 16
      },
      "process" : {
        "refresh_interval_in_millis" : 1000,
        "id" : 422075,
        "mlockall" : false
      },
      "jvm" : {
        "pid" : 422075,
        "version" : "15.0.1",
        "vm_name" : "OpenJDK 64-Bit Server VM",
        "vm_version" : "15.0.1+9",
        "vm_vendor" : "AdoptOpenJDK",
        "bundled_jdk" : true,
        "using_bundled_jdk" : true,
        "start_time_in_millis" : 1677873247146,
        "mem" : {
          "heap_init_in_bytes" : 2147483648,
          "heap_max_in_bytes" : 2147483648,
          "non_heap_init_in_bytes" : 7667712,
          "non_heap_max_in_bytes" : 0,
          "direct_max_in_bytes" : 0
        },
        "gc_collectors" : [
          "G1 Young Generation",
          "G1 Old Generation"
        ],
        "memory_pools" : [
          "CodeHeap 'non-nmethods'",
          "Metaspace",
          "CodeHeap 'profiled nmethods'",
          "Compressed Class Space",
          "G1 Eden Space",
          "G1 Old Gen",
          "G1 Survivor Space",
          "CodeHeap 'non-profiled nmethods'"
        ],
        "using_compressed_ordinary_object_pointers" : "true",
        "input_arguments" : [
          "-Xshare:auto",
          "-Des.networkaddress.cache.ttl=60",
          "-Des.networkaddress.cache.negative.ttl=10",
          "-XX:+AlwaysPreTouch",
          "-Xss1m",
          "-Djava.awt.headless=true",
          "-Dfile.encoding=UTF-8",
          "-Djna.nosys=true",
          "-XX:-OmitStackTraceInFastThrow",
          "-XX:+ShowCodeDetailsInExceptionMessages",
          "-Dio.netty.noUnsafe=true",
          "-Dio.netty.noKeySetOptimization=true",
          "-Dio.netty.recycler.maxCapacityPerThread=0",
          "-Dio.netty.allocator.numDirectArenas=0",
          "-Dlog4j.shutdownHookEnabled=false",
          "-Dlog4j2.disable.jmx=true",
          "-Djava.locale.providers=SPI,COMPAT",
          "-Xms2g",
          "-Xmx2g",
          "-XX:+UseG1GC",
          "-XX:G1ReservePercent=25",
          "-XX:InitiatingHeapOccupancyPercent=30",
          "-Des.networkaddress.cache.ttl=60",
          "-Des.networkaddress.cache.negative.ttl=10",
          "-XX:+AlwaysPreTouch",
          "-Xss1m",
          "-Djava.awt.headless=true",
          "-Dfile.encoding=UTF-8",
          "-Djna.nosys=true",
          "-XX:-OmitStackTraceInFastThrow",
          "-XX:+ShowCodeDetailsInExceptionMessages",
          "-Dio.netty.noUnsafe=true",
          "-Dio.netty.noKeySetOptimization=true",
          "-Dio.netty.recycler.maxCapacityPerThread=0",
          "-Dlog4j.shutdownHookEnabled=false",
          "-Dlog4j2.disable.jmx=true",
          "-Dlog4j2.formatMsgNoLookups=true",
          "-Djava.io.tmpdir=/tmp/elasticsearch-5908758710646640467",
          "-XX:+HeapDumpOnOutOfMemoryError",
          "-XX:HeapDumpPath=/var/lib/elasticsearch",
          "-XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log",
          "-Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m",
          "-Djava.locale.providers=COMPAT",
          "-XX:UseAVX=0",
          "-XX:MaxDirectMemorySize=1073741824",
          "-Des.path.home=/usr/share/elasticsearch",
          "-Des.path.conf=/etc/elasticsearch",
          "-Des.distribution.flavor=default",
          "-Des.distribution.type=deb",
          "-Des.bundled_jdk=true"
        ]
      },
      "thread_pool" : {
        "force_merge" : {
          "type" : "fixed",
          "size" : 1,
          "queue_size" : -1
        },
        "ml_datafeed" : {
          "type" : "scaling",
          "core" : 1,
          "max" : 512,
          "keep_alive" : "1m",
          "queue_size" : -1
        },
        "searchable_snapshots_cache_fetch_async" : {
          "type" : "scaling",
          "core" : 0,
          "max" : 32,
          "keep_alive" : "30s",
          "queue_size" : -1
        },
        "fetch_shard_started" : {
          "type" : "scaling",
          "core" : 1,
          "max" : 32,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "listener" : {
          "type" : "fixed",
          "size" : 8,
          "queue_size" : -1
        },
        "rollup_indexing" : {
          "type" : "fixed",
          "size" : 4,
          "queue_size" : 4
        },
        "search" : {
          "type" : "fixed_auto_queue_size",
          "size" : 25,
          "queue_size" : 1000
        },
        "security-crypto" : {
          "type" : "fixed",
          "size" : 8,
          "queue_size" : 1000
        },
        "ccr" : {
          "type" : "fixed",
          "size" : 32,
          "queue_size" : 100
        },
        "flush" : {
          "type" : "scaling",
          "core" : 1,
          "max" : 5,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "fetch_shard_store" : {
          "type" : "scaling",
          "core" : 1,
          "max" : 32,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "ml_utility" : {
          "type" : "scaling",
          "core" : 1,
          "max" : 2048,
          "keep_alive" : "10m",
          "queue_size" : -1
        },
        "get" : {
          "type" : "fixed",
          "size" : 16,
          "queue_size" : 1000
        },
        "system_read" : {
          "type" : "fixed",
          "size" : 5,
          "queue_size" : 2000
        },
        "transform_indexing" : {
          "type" : "fixed",
          "size" : 4,
          "queue_size" : 4
        },
        "write" : {
          "type" : "fixed",
          "size" : 16,
          "queue_size" : 10000
        },
        "watcher" : {
          "type" : "fixed",
          "size" : 50,
          "queue_size" : 1000
        },
        "security-token-key" : {
          "type" : "fixed",
          "size" : 1,
          "queue_size" : 1000
        },
        "refresh" : {
          "type" : "scaling",
          "core" : 1,
          "max" : 8,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "system_write" : {
          "type" : "fixed",
          "size" : 5,
          "queue_size" : 1000
        },
        "generic" : {
          "type" : "scaling",
          "core" : 4,
          "max" : 128,
          "keep_alive" : "30s",
          "queue_size" : -1
        },
        "warmer" : {
          "type" : "scaling",
          "core" : 1,
          "max" : 5,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "management" : {
          "type" : "scaling",
          "core" : 1,
          "max" : 5,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "analyze" : {
          "type" : "fixed",
          "size" : 1,
          "queue_size" : 16
        },
        "searchable_snapshots_cache_prewarming" : {
          "type" : "scaling",
          "core" : 0,
          "max" : 32,
          "keep_alive" : "30s",
          "queue_size" : -1
        },
        "ml_job_comms" : {
          "type" : "scaling",
          "core" : 4,
          "max" : 2048,
          "keep_alive" : "1m",
          "queue_size" : -1
        },
        "snapshot" : {
          "type" : "scaling",
          "core" : 1,
          "max" : 5,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "search_throttled" : {
          "type" : "fixed_auto_queue_size",
          "size" : 1,
          "queue_size" : 100
        }
      },
      "transport" : {
        "bound_address" : [
          "[::1]:9300",
          "127.0.0.1:9300"
        ],
        "publish_address" : "127.0.0.1:9300",
        "profiles" : { }
      },
      "http" : {
        "bound_address" : [
          "[::1]:9200",
          "127.0.0.1:9200"
        ],
        "publish_address" : "127.0.0.1:9200",
        "max_content_length_in_bytes" : 104857600
      },
      "plugins" : [ ],
      "modules" : [
        {
          "name" : "aggs-matrix-stats",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Adds aggregations whose input are a list of numeric fields and output includes a matrix.",
          "classname" : "org.elasticsearch.search.aggregations.matrix.MatrixAggregationPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "analysis-common",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Adds \"built in\" analyzers to Elasticsearch.",
          "classname" : "org.elasticsearch.analysis.common.CommonAnalysisPlugin",
          "extended_plugins" : [
            "lang-painless"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "constant-keyword",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Module for the constant-keyword field type, which is a specialization of keyword for the case when all documents have the same value.",
          "classname" : "org.elasticsearch.xpack.constantkeyword.ConstantKeywordMapperPlugin",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "flattened",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Module for the flattened field type, which allows JSON objects to be flattened into a single field.",
          "classname" : "org.elasticsearch.xpack.flattened.FlattenedMapperPlugin",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "frozen-indices",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "A plugin for the frozen indices functionality",
          "classname" : "org.elasticsearch.xpack.frozen.FrozenIndices",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "ingest-common",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Module for ingest processors that do not require additional security permissions or have large dependencies and resources",
          "classname" : "org.elasticsearch.ingest.common.IngestCommonPlugin",
          "extended_plugins" : [
            "lang-painless"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "ingest-geoip",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Ingest processor that uses looksup geo data based on ip adresses using the Maxmind geo database",
          "classname" : "org.elasticsearch.ingest.geoip.IngestGeoIpPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "ingest-user-agent",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Ingest processor that extracts information from a user agent",
          "classname" : "org.elasticsearch.ingest.useragent.IngestUserAgentPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "kibana",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Plugin exposing APIs for Kibana system indices",
          "classname" : "org.elasticsearch.kibana.KibanaPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "lang-expression",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Lucene expressions integration for Elasticsearch",
          "classname" : "org.elasticsearch.script.expression.ExpressionPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "lang-mustache",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Mustache scripting integration for Elasticsearch",
          "classname" : "org.elasticsearch.script.mustache.MustachePlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "lang-painless",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "An easy, safe and fast scripting language for Elasticsearch",
          "classname" : "org.elasticsearch.painless.PainlessPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "mapper-extras",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Adds advanced field mappers",
          "classname" : "org.elasticsearch.index.mapper.MapperExtrasPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "mapper-version",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "A plugin for a field type to store sofware versions",
          "classname" : "org.elasticsearch.xpack.versionfield.VersionFieldPlugin",
          "extended_plugins" : [
            "x-pack-core",
            "lang-painless"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "parent-join",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "This module adds the support parent-child queries and aggregations",
          "classname" : "org.elasticsearch.join.ParentJoinPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "percolator",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Percolator module adds capability to index queries and query these queries by specifying documents",
          "classname" : "org.elasticsearch.percolator.PercolatorPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "rank-eval",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "The Rank Eval module adds APIs to evaluate ranking quality.",
          "classname" : "org.elasticsearch.index.rankeval.RankEvalPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "reindex",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "The Reindex module adds APIs to reindex from one index to another or update documents in place.",
          "classname" : "org.elasticsearch.index.reindex.ReindexPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "repositories-metering-api",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Repositories metering API",
          "classname" : "org.elasticsearch.xpack.repositories.metering.RepositoriesMeteringPlugin",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "repository-url",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Module for URL repository",
          "classname" : "org.elasticsearch.plugin.repository.url.URLRepositoryPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "search-business-rules",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "A plugin for applying business rules to search result rankings",
          "classname" : "org.elasticsearch.xpack.searchbusinessrules.SearchBusinessRules",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "searchable-snapshots",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "A plugin for the searchable snapshots functionality",
          "classname" : "org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshots",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "spatial",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "A plugin for Basic Spatial features",
          "classname" : "org.elasticsearch.xpack.spatial.SpatialPlugin",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "systemd",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Integrates Elasticsearch with systemd",
          "classname" : "org.elasticsearch.systemd.SystemdPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "transform",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "A plugin to transform data",
          "classname" : "org.elasticsearch.xpack.transform.Transform",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "transport-netty4",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Netty 4 based transport implementation",
          "classname" : "org.elasticsearch.transport.Netty4Plugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "unsigned-long",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Module for the unsigned long field type",
          "classname" : "org.elasticsearch.xpack.unsignedlong.UnsignedLongMapperPlugin",
          "extended_plugins" : [
            "x-pack-core",
            "lang-painless"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "vectors",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "A plugin for working with vectors",
          "classname" : "org.elasticsearch.xpack.vectors.Vectors",
          "extended_plugins" : [
            "x-pack-core",
            "lang-painless"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "wildcard",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "A plugin for a keyword field type with efficient wildcard search",
          "classname" : "org.elasticsearch.xpack.wildcard.Wildcard",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-analytics",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Analytics",
          "classname" : "org.elasticsearch.xpack.analytics.AnalyticsPlugin",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-async",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "A module which handles common async operations",
          "classname" : "org.elasticsearch.xpack.async.AsyncResultsIndexPlugin",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-async-search",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "A module which allows to track the progress of a search asynchronously.",
          "classname" : "org.elasticsearch.xpack.search.AsyncSearch",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-autoscaling",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Autoscaling",
          "classname" : "org.elasticsearch.xpack.autoscaling.Autoscaling",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-ccr",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - CCR",
          "classname" : "org.elasticsearch.xpack.ccr.Ccr",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-core",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Core",
          "classname" : "org.elasticsearch.xpack.core.XPackPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-data-streams",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Data Streams",
          "classname" : "org.elasticsearch.xpack.datastreams.DataStreamsPlugin",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-deprecation",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Deprecation",
          "classname" : "org.elasticsearch.xpack.deprecation.Deprecation",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-enrich",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Enrich",
          "classname" : "org.elasticsearch.xpack.enrich.EnrichPlugin",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-eql",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "The Elasticsearch plugin that powers EQL for Elasticsearch",
          "classname" : "org.elasticsearch.xpack.eql.plugin.EqlPlugin",
          "extended_plugins" : [
            "x-pack-ql",
            "lang-painless"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-graph",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Graph",
          "classname" : "org.elasticsearch.xpack.graph.Graph",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-identity-provider",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Identity Provider",
          "classname" : "org.elasticsearch.xpack.idp.IdentityProviderPlugin",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-ilm",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Index Lifecycle Management",
          "classname" : "org.elasticsearch.xpack.ilm.IndexLifecycle",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-logstash",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Logstash",
          "classname" : "org.elasticsearch.xpack.logstash.Logstash",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-ml",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Machine Learning",
          "classname" : "org.elasticsearch.xpack.ml.MachineLearning",
          "extended_plugins" : [
            "x-pack-core",
            "lang-painless"
          ],
          "has_native_controller" : true
        },
        {
          "name" : "x-pack-monitoring",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Monitoring",
          "classname" : "org.elasticsearch.xpack.monitoring.Monitoring",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-ql",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Elasticsearch infrastructure plugin for EQL and SQL for Elasticsearch",
          "classname" : "org.elasticsearch.xpack.ql.plugin.QlPlugin",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-rollup",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Rollup",
          "classname" : "org.elasticsearch.xpack.rollup.Rollup",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-security",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Security",
          "classname" : "org.elasticsearch.xpack.security.Security",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-sql",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "The Elasticsearch plugin that powers SQL for Elasticsearch",
          "classname" : "org.elasticsearch.xpack.sql.plugin.SqlPlugin",
          "extended_plugins" : [
            "x-pack-ql",
            "lang-painless"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-stack",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Stack",
          "classname" : "org.elasticsearch.xpack.stack.StackPlugin",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-voting-only-node",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Voting-only node",
          "classname" : "org.elasticsearch.cluster.coordination.VotingOnlyNodePlugin",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-watcher",
          "version" : "7.10.2",
          "elasticsearch_version" : "7.10.2",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Watcher",
          "classname" : "org.elasticsearch.xpack.watcher.Watcher",
          "extended_plugins" : [
            "x-pack-core",
            "lang-painless"
          ],
          "has_native_controller" : false
        }
      ],
      "ingest" : {
        "processors" : [
          {
            "type" : "append"
          },
          {
            "type" : "bytes"
          },
          {
            "type" : "circle"
          },
          {
            "type" : "convert"
          },
          {
            "type" : "csv"
          },
          {
            "type" : "date"
          },
          {
            "type" : "date_index_name"
          },
          {
            "type" : "dissect"
          },
          {
            "type" : "dot_expander"
          },
          {
            "type" : "drop"
          },
          {
            "type" : "enrich"
          },
          {
            "type" : "fail"
          },
          {
            "type" : "foreach"
          },
          {
            "type" : "geoip"
          },
          {
            "type" : "grok"
          },
          {
            "type" : "gsub"
          },
          {
            "type" : "html_strip"
          },
          {
            "type" : "inference"
          },
          {
            "type" : "join"
          },
          {
            "type" : "json"
          },
          {
            "type" : "kv"
          },
          {
            "type" : "lowercase"
          },
          {
            "type" : "pipeline"
          },
          {
            "type" : "remove"
          },
          {
            "type" : "rename"
          },
          {
            "type" : "script"
          },
          {
            "type" : "set"
          },
          {
            "type" : "set_security_user"
          },
          {
            "type" : "sort"
          },
          {
            "type" : "split"
          },
          {
            "type" : "trim"
          },
          {
            "type" : "uppercase"
          },
          {
            "type" : "urldecode"
          },
          {
            "type" : "user_agent"
          }
        ]
      },
      "aggregations" : {
        "adjacency_matrix" : {
          "types" : [
            "other"
          ]
        },
        "auto_date_histogram" : {
          "types" : [
            "boolean",
            "date",
            "numeric"
          ]
        },
        "avg" : {
          "types" : [
            "boolean",
            "date",
            "histogram",
            "numeric"
          ]
        },
        "boxplot" : {
          "types" : [
            "histogram",
            "numeric"
          ]
        },
        "cardinality" : {
          "types" : [
            "boolean",
            "bytes",
            "date",
            "geopoint",
            "geoshape",
            "ip",
            "numeric",
            "range"
          ]
        },
        "children" : {
          "types" : [
            "other"
          ]
        },
        "composite" : {
          "types" : [
            "other"
          ]
        },
        "date_histogram" : {
          "types" : [
            "boolean",
            "date",
            "numeric",
            "range"
          ]
        },
        "date_range" : {
          "types" : [
            "boolean",
            "date",
            "numeric"
          ]
        },
        "diversified_sampler" : {
          "types" : [
            "boolean",
            "bytes",
            "date",
            "numeric"
          ]
        },
        "extended_stats" : {
          "types" : [
            "boolean",
            "date",
            "numeric"
          ]
        },
        "filter" : {
          "types" : [
            "other"
          ]
        },
        "filters" : {
          "types" : [
            "other"
          ]
        },
        "geo_bounds" : {
          "types" : [
            "geopoint",
            "geoshape"
          ]
        },
        "geo_centroid" : {
          "types" : [
            "geopoint",
            "geoshape"
          ]
        },
        "geo_distance" : {
          "types" : [
            "geopoint"
          ]
        },
        "geohash_grid" : {
          "types" : [
            "geopoint",
            "geoshape"
          ]
        },
        "geotile_grid" : {
          "types" : [
            "geopoint",
            "geoshape"
          ]
        },
        "global" : {
          "types" : [
            "other"
          ]
        },
        "histogram" : {
          "types" : [
            "boolean",
            "date",
            "histogram",
            "numeric",
            "range"
          ]
        },
        "ip_range" : {
          "types" : [
            "ip"
          ]
        },
        "matrix_stats" : {
          "types" : [
            "other"
          ]
        },
        "max" : {
          "types" : [
            "boolean",
            "date",
            "histogram",
            "numeric"
          ]
        },
        "median_absolute_deviation" : {
          "types" : [
            "numeric"
          ]
        },
        "min" : {
          "types" : [
            "boolean",
            "date",
            "histogram",
            "numeric"
          ]
        },
        "missing" : {
          "types" : [
            "boolean",
            "bytes",
            "date",
            "geopoint",
            "ip",
            "numeric",
            "range"
          ]
        },
        "nested" : {
          "types" : [
            "other"
          ]
        },
        "parent" : {
          "types" : [
            "other"
          ]
        },
        "percentile_ranks" : {
          "types" : [
            "boolean",
            "date",
            "histogram",
            "numeric"
          ]
        },
        "percentiles" : {
          "types" : [
            "boolean",
            "date",
            "histogram",
            "numeric"
          ]
        },
        "range" : {
          "types" : [
            "boolean",
            "date",
            "numeric"
          ]
        },
        "rare_terms" : {
          "types" : [
            "boolean",
            "bytes",
            "date",
            "ip",
            "numeric"
          ]
        },
        "rate" : {
          "types" : [
            "boolean",
            "numeric"
          ]
        },
        "reverse_nested" : {
          "types" : [
            "other"
          ]
        },
        "sampler" : {
          "types" : [
            "other"
          ]
        },
        "scripted_metric" : {
          "types" : [
            "other"
          ]
        },
        "significant_terms" : {
          "types" : [
            "boolean",
            "bytes",
            "date",
            "ip",
            "numeric"
          ]
        },
        "significant_text" : {
          "types" : [
            "other"
          ]
        },
        "stats" : {
          "types" : [
            "boolean",
            "date",
            "numeric"
          ]
        },
        "string_stats" : {
          "types" : [
            "bytes"
          ]
        },
        "sum" : {
          "types" : [
            "boolean",
            "date",
            "histogram",
            "numeric"
          ]
        },
        "t_test" : {
          "types" : [
            "numeric"
          ]
        },
        "terms" : {
          "types" : [
            "boolean",
            "bytes",
            "date",
            "ip",
            "numeric"
          ]
        },
        "top_hits" : {
          "types" : [
            "other"
          ]
        },
        "top_metrics" : {
          "types" : [
            "other"
          ]
        },
        "value_count" : {
          "types" : [
            "boolean",
            "bytes",
            "date",
            "geopoint",
            "geoshape",
            "histogram",
            "ip",
            "numeric",
            "range"
          ]
        },
        "variable_width_histogram" : {
          "types" : [
            "numeric"
          ]
        },
        "weighted_avg" : {
          "types" : [
            "numeric"
          ]
        }
      }
    }
  }
}

Tweaks

Elas­tic­search could use huge amount of RAM. But, I've test­ed it for thin in­stances it work even with on­ly 128m. The main con­fig­u­ra­tion files are lo­cat­ed in­to the di­rec­to­ry /​​​etc/​​​elasticsearch/​​​. You can tweak the amount of Ram in use by tweak­ing the rel­e­vant lines in the file jvm.options. Note Xms and Xmx val­ues must be equal.

sudo nano /etc/elasticsearch/jvm.options
# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space

#-Xms512m
#-Xmx512m
-Xms4g
-Xmx4g

Add restart al­ways di­rec­tive to the Elasticsearch's sys­temd unit.

sudo systemctl edit elasticsearch.service
[Service]
# SZS/MLT Tweak
Restart=always
RestartSec=3

To ap­ply the changes use the fol­low­ing com­mands.

sudo systemctl daemon-reload
sudo systemctl restart elasticsearch.service
systemctl status elasticsearch.service
systemctl cat elasticsearch.service

Me­di­aWi­ki Set­up

The main pur­pose of this guide is how to set­up Elas­tic­search to be used by MediaWiki's ex­ten­sion Cir­rusSearch, so in this sec­tion we will de­scribe how to do that. In ad­di­tion al­so the ex­ten­sion Ad­vanced­Search will be in­stalled and con­fig­ured.

If you have in­stalled the ex­ten­sion PdfHan­dler (or some oth­er file han­dling ex­ten­sion) Cir­rusSearch will show re­sults from the files con­tent – in the con­fig­u­ra­tion be­low is shown how to boost these re­sults. How to con­fig­ure ex­ten­sion Trans­late to use Elas­tic­search is de­cried in the MediaWiki's doc­u­men­ta­tion in the ar­ti­cle Trans­la­tion mem­o­ries.

In­stall the Ex­ten­sions Bun­dle

First of all you need to in­stall the ex­ten­sions with­in the MediaWiki's doc­u­ment root. In the fol­low­ing ex­am­ple is used the ap­proach Down­load from Git.

IP="/var/www/wiki.example.com" # The DocumentRoot directory of the wiki
OWNER="www-data"               # The user that owns the $IP directory
BRANCH="REL1_39"               # The MediaWiki's branch in use
cd "$IP/extensions"
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/AdvancedSearch --branch ${BRANCH}
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/Elastica --branch ${BRANCH}
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/CirrusSearch --branch ${BRANCH}
sudo chown -R ${Owner}:${Owner} Elastica/ CirrusSearch/
for ext in Elastica CirrusSearch; do sudo -u ${Owner} composer install --no-dev; done

LocalSettings.php Con­fig­u­ra­tion

Open the con­fig­u­ra­tion file with your fa­vorite ed­i­tor and place the fol­low­ing lines at suit­able place (the end of the file is good place). In the ex­am­ple be­low is shown the cur­rent con­fig­u­ra­tion of this wi­ki. Af­ter the build­ing of the search in­dex (next sec­tion) Cir­rusSearch should work with­out the ad­vanced set­up. More op­tions are de­scribed in the Extension:CirrusSearch page, al­so some un­doc­u­ment­ed op­tions could be found with­in its extension.json file.

sudo nano "$IP/LocalSettings.php"
## Extension:AdvancedSearch
wfLoadExtension( 'AdvancedSearch' );
$wgAdvancedSearchDeepcatEnabled = false;    // https://www.mediawiki.org/wiki/Topic:Uw036nwsilvb6w3t
$wgAdvancedSearchBetaFeature = false;       // (enable it by default) https://m.mediawiki.org/wiki/Topic:Upflskaswcvrunka
$wgAdvancedSearchHighlighting = true;       // https://www.mediawiki.org/wiki/Manual:Configuration_settings_(alphabetical)
$wgOpenSearchDescriptionLength = 2500;      // https://www.mediawiki.org/wiki/Manual:$wgOpenSearchDescriptionLength

## Extension:Elastica
wfLoadExtension( 'Elastica' );

## Extension:CirrusSearch
wfLoadExtension( 'CirrusSearch' );
// $wgDisableSearchUpdate = true;
$wgSearchType = 'CirrusSearch';
$wgDebugLogGroups['CirrusSearch'] = "$IP/cache/CirrusSearch.log";
// $wgCirrusSearchIndexBaseName = 'wiki_db_name';       // https://www.mediawiki.org/wiki/Extension:CirrusSearch#Configuration
// $wgCirrusSearchServers = [ '10.120.201.1' ];         // The address of the Elasticsearch serer if it is not available at 'localhost'

## Extension:CirrusSearch Advanced Setup
$wgCirrusSearchRescoreProfile = 'classic_noboostlinks';
// $wgCirrusSearchFullTextQueryBuilderProfiles = 'perfield_builder';
// $wgCirrusSearchCompletionProfiles = 'normal';
$wgCirrusSearchPhraseSuggestUseText = true;
$wgCirrusSearchCompletionSuggesterHardLimit = 200; // 50
$wgCirrusSearchFragmentSize = 200;
// $wgCirrusExploreSimilarResults = true;

// Give much weight to the "file_text" in order to show 
// results from the PDFs content. This requires PdfHandler
$wgCirrusSearchWeights = [
    "title" => 20,
    "redirect" => 15,
    "category" => 8,
    "heading" => 5,
    "opening_text" => 3,
    "text" => 5,
    "auxiliary_text" => 15,
    "file_text" => 25
];

// https://www.mediawiki.org/wiki/Help:Namespaces#Localisation
$wgCirrusSearchNamespaceWeights = [
    "2" => 0.05,
    "4" => 0.3,
    "6" => 0.2,
    "8" => 0.05,
    "10" => 0.005,
    "12" => 0.2,
    "14" => 0.1
];

Build Search In­dex

How to build and up­date the CirrusSearch/​​​Elasticsearc in­dex is well de­scribed in the doc­u­ments README and UP­GRADE which comes with the ex­ten­sion. Here the im­por­tant steps re­lat­ed to the in­dex build­ing for a first time are ex­tract­ed and con­vert­ed to a script – for a sin­gle wi­ki and for a wi­ki fam­i­ly.

sudo nano /usr/local/bin/"mlw-cirrussearch-elasticsearch-build-index-single-wiki.sh"
#!/bin/bash

# @author    Spas Z. Spasov <spas.z.spasov@metalevel.tech>
# @copyright 2022 Spas Z. Spasov
# @license   https://www.gnu.org/licenses/gpl-3.0.html GNU General Public License, version 3 (or later)
#
# @source    https://phabricator.wikimedia.org/source/extension-cirrussearch/browse/master/README
# @home      https://wiki.metalevel.tech/wiki/Elasticsearch_and_MediaWiki_CirrusSearch
# @install   Create executable file within and place this code as its content:
#            /usr/local/bin/mlw-cirrussearch-elasticsearch-build-index-single-wiki.sh
#
# @desc      Create elastic search index for a singel MediaWiki

: ${IP:="/var/www/wiki.example.com"} # The DocumentRoot directory of the wiki
: ${OWNER:="www-data"}               # The user that owns the $IP directory

CS_MAINT_DIR="${IP}/extensions/CirrusSearch/maintenance"

## STEP 1
printf -- '\n**\n%s for Wiki:"%s" \n*\n\n' "Disable CirrusSearch amd Search update" "${IP##*/}"
echo; sleep 5
sudo -u "$OWNER" sed -i 's#^$wgSearchType#// $wgSearchType#' "${IP}/LocalSettings.php"
sudo -u "$OWNER" sed -i 's#^// $wgDisableSearchUpdate#$wgDisableSearchUpdate#' "${IP}/LocalSettings.php"

printf -- '\n**\n%s\n*\n' "LocalSettings.php Audit"
sudo -u "$OWNER" grep '$wgSearchType\|$wgDisableSearchUpdate = true' "${IP}/LocalSettings.php"
echo; sleep 5

## STEP 2
printf -- '\n**\n%s for Wiki:"%s" \n*\n\n' "Generate Elasticsearch Index" "${IP##*/}"
echo; sleep 5
sudo -u "$OWNER" /usr/bin/php "${CS_MAINT_DIR}/UpdateSearchIndexConfig.php" --startOver --conf "${IP}/LocalSettings.php"

## STEP 3
printf -- '\n**\n%s for Wiki:"%s" \n*\n\n' "Enable Search Update" "${IP##*/}"
echo; sleep 5
sudo -u "$OWNER" sed -i 's#^$wgDisableSearchUpdate#// $wgDisableSearchUpdate#' "${IP}/LocalSettings.php"

printf -- '\n**\n%s\n*\n' "LocalSettings.php Audit"
sudo -u "$OWNER" grep '$wgSearchType\|$wgDisableSearchUpdate = true' "${IP}/LocalSettings.php"
echo; sleep 5

## STEP 4
printf -- '\n**\n%s for Wiki:"%s" \n*\n\n' "Bootstrap the Search Index" "${IP##*/}"
echo; sleep 5
sudo -u "$OWNER" /usr/bin/php "${CS_MAINT_DIR}/ForceSearchIndex.php" --skipLinks --indexOnSkip --conf "${IP}/LocalSettings.php"
sudo -u "$OWNER" /usr/bin/php "${CS_MAINT_DIR}/ForceSearchIndex.php" --skipParse --conf "${IP}/LocalSettings.php"

## STEP 5
printf -- '\n**\n%s for Wiki:"%s" \n*\n\n' "Enable Cirrus Search" "${IP##*/}"
echo; sleep 5
sudo -u "$OWNER" sed -i 's#^// $wgSearchType#$wgSearchType#' "${IP}/LocalSettings.php"

printf -- '\n**\n%s\n*\n' "LocalSettings.php Audit"
sudo -u "$OWNER" grep '$wgSearchType\|$wgDisableSearchUpdate = true' "${IP}/LocalSettings.php"
echo; sleep 5

## Step 6
printf -- '\n**\n%s for Wiki:"%s" \n*\n\n' "Update Cirrus Search Suggestions (if the option is enabled in LocalSettings.php)" "${IP##*/}"
sudo -u "$OWNER" /usr/bin/php "${CS_MAINT_DIR}/UpdateSuggesterIndex.php" --conf "${IP}/LocalSettings.php"

## Step 7 - this is the most time consumption step, you cold skip it and run it later...
printf -- '\n**\n%s for Wiki:"%s" \n*\n\n' "Run Jobs Quiue" "${IP##*/}"
echo; sleep 5
sudo -u "$OWNER" /usr/bin/php "${IP}/maintenance/runJobs.php" --conf "${IP}/LocalSettings.php"
sudo nano /usr/local/bin/"mlw-cirrussearch-elasticsearch-build-index-wiki-family.sh"
#!/bin/bash

# @author    Spas Z. Spasov <spas.z.spasov@metalevel.tech>
# @copyright 2022 Spas Z. Spasov
# @license   https://www.gnu.org/licenses/gpl-3.0.html GNU General Public License, version 3 (or later)
#
# @source    https://phabricator.wikimedia.org/source/extension-cirrussearch/browse/master/README
# @home      https://wiki.metalevel.tech/wiki/Elasticsearch_and_MediaWiki_CirrusSearch
# @install   Create executable file within and place this code as its content:
#            /usr/local/bin/mlw-cirrussearch-elasticsearch-build-index-wiki-family.sh
#
# @desc      Create elastic search index for a MediaWiki Family
#            Note: In this scenariou the family members share the same DocumentRoot
#			 and LocalSettings.php file (and Apache2 virtual host configuration).

: ${IP:="/var/www/wiki-family.example.com"} # The DocumentRoot directory of the wiki
: ${OWNER:="www-data"}                      # The user that owns the $IP directory
: ${WIKI_IDs:="bg" "en" "ru" "commons"}     # The user that owns the $IP directory

CS_MAINT_DIR="${IP}/extensions/CirrusSearch/maintenance"

## STEP 1
printf -- '\n**\n%s for Wiki:"%s" \n*\n\n' "Disable CirrusSearch amd Search update" "${IP##*/}"
echo; sleep 5
sudo -u "$OWNER" sed -i 's#^$wgSearchType#// $wgSearchType#' "${IP}/LocalSettings.php"
sudo -u "$OWNER" sed -i 's#^// $wgDisableSearchUpdate#$wgDisableSearchUpdate#' "${IP}/LocalSettings.php"

printf -- '\n**\n%s\n*\n' "LocalSettings.php Audit"
sudo -u "$OWNER" grep '$wgSearchType\|$wgDisableSearchUpdate = true' "${IP}/LocalSettings.php"
echo; sleep 5

## STEP 2
for WIKI_ID in ${WIKI_IDs[@]}
do
    printf -- '\n**\n%s for Wiki:"%s" \n*\n\n' "Generate Elasticsearch Index" "${IP##*/}::${WIKI_ID}"
    echo; sleep 5
    sudo -u "$OWNER" /usr/bin/php "${CS_MAINT_DIR}/UpdateSearchIndexConfig.php" --startOver --wiki "${WIKI_ID}" #--conf "${IP}/LocalSettings.php"
done

## STEP 3
printf -- '\n**\n%s for Wiki:"%s" \n*\n\n' "Enable Search Update" "${IP##*/}"
echo; sleep 5
sudo -u "$OWNER" sed -i 's#^$wgDisableSearchUpdate#// $wgDisableSearchUpdate#' "${IP}/LocalSettings.php"

printf -- '\n**\n%s\n*\n' "LocalSettings.php Audit"
sudo -u "$OWNER" grep '$wgSearchType\|$wgDisableSearchUpdate = true' "${IP}/LocalSettings.php"
echo; sleep 5

## STEP 4
for WIKI_ID in ${WIKI_IDs[@]}
do
    printf -- '\n**\n%s for Wiki:"%s" \n*\n\n' "Bootstrap the Search Index" "${IP##*/}::${WIKI_ID}"
    echo; sleep 5
    sudo -u "$OWNER" /usr/bin/php "${CS_MAINT_DIR}/ForceSearchIndex.php" --skipLinks --indexOnSkip --wiki "${WIKI_ID}" #--conf "${IP}/LocalSettings.php"
    sudo -u "$OWNER" /usr/bin/php "${CS_MAINT_DIR}/ForceSearchIndex.php" --skipParse --wiki "${WIKI_ID}" #--conf "${IP}/LocalSettings.php"
    sleep 2
done

## STEP 5
printf -- '\n**\n%s for Wiki:"%s" \n*\n\n' "Enable Cirrus Search" "${IP##*/}"
echo; sleep 5
sudo -u "$OWNER" sed -i 's#^// $wgSearchType#$wgSearchType#' "${IP}/LocalSettings.php"

printf -- '\n**\n%s\n*\n' "LocalSettings.php Audit"
sudo -u "$OWNER" grep '$wgSearchType\|$wgDisableSearchUpdate = true' "${IP}/LocalSettings.php"
echo; sleep 5

## Step 6
for WIKI_ID in ${WIKI_IDs[@]}
do
    printf -- '\n**\n%s for Wiki:"%s" \n*\n\n' "Update Cirrus Search Suggestions (if the option is enabled in LocalSettings.php)" "${IP##*/}::${WIKI_ID}"
    echo; sleep 5
    sudo -u "$OWNER" /usr/bin/php "${CS_MAINT_DIR}/UpdateSuggesterIndex.php" --wiki "${WIKI_ID}" #--conf "${IP}/LocalSettings.php"
done

## Step 7 - this is the most time consumption step, you cold skip it and run it later...
for WIKI_ID in ${WIKI_IDs[@]}
do
    printf -- '\n**\n%s for Wiki:"%s" \n*\n\n' "Run Jobs Quiue"  "${IP##*/}::${WIKI_ID}"
    echo; sleep 5
    sudo -u "$OWNER" /usr/bin/php "${IP}/maintenance/runJobs.php" --wiki "${WIKI_ID}" #--conf "${IP}/LocalSettings.php"
done

Up­date

When you are up­date Cir­rusSearch or/​​​and Elas­tic­search there are sev­er­al pos­si­ble cas­es, which are well de­scribed in the CirrsusSearch's Doc­u­men­ta­tion files README and UP­GRADE.

One pos­si­ble way is to use the ex­ten­sions main­te­nance script as it is show be­low, or you can re­build the en­tire search in­dex as it is shown above :)

cd "$IP/extensions/CirrusSearch"
sudo -u $OWNER php maintenance/UpdateSearchIndexConfig.php --reindexAndRemoveOk --indexIdentifier now
sudo -u $OWNER php maintenance/Metastore.php --upgrade

To mon­i­tor the filesys­tem changes dur­ing the up­date you can use a com­mand like the fol­low­ing.

sudo watch "du -hs /var/lib/mysql/wiki_id && \
du -hs /var/lib/elasticsearch && \
php /var/www/wiki.metalevel.tech/maintenance/showJobs.php --type cirrusSearchElasticaWrite"

Ad­di­tion­al Set­up

Ac­cess Elas­tic­search via SSH Tun­nel

Us­ing such ap­proach is suit­able on­ly for test pur­pose, here is a man­u­al how to set-up:

Elas­tic­search Watch Scripts

Here are two ex­am­ple scripts that cov­er the fol­low­ing sce­nar­ios: [1] When the Elas­tic­search ser­vice is used on the same in­stance where it is used; and [2] When the Elas­tic­search ser­vice is used on an­oth­er host (in­stance) and we must be sure it is avail­able there.

sudo nano /usr/local/bin/"mlw-elasticsearch-watch-local.sh"
#!/bin/bash -e

# @author    Spas Z. Spasov <spas.z.spasov@metalevel.tech>
# @copyright 2022 Spas Z. Spasov
# @license   https://www.gnu.org/licenses/gpl-3.0.html GNU General Public License, version 3 (or later)
#
# @home      https://wiki.metalevel.tech/wiki/Elasticsearch_and_MediaWiki_CirrusSearch
# @install   Create executable file within and place this code as its content:
#            /usr/local/bin/mlw-elasticsearch-watch-local.sh
#
# @desc      Test wheather Elasticsearch is accessible, if not attempt to restar and send email notification

: ${EMAIL_SENDER:="admin@example.com"}                 # The email address of the reponsible person
: ${EMAIL_ADMIN:="admin@example.com"}                  # The email address which sends the email
: ${EMAIL_BODY:="/tmp/elasticsearch-watch.email.body"} # Temporary file where the email body will be stored

if /usr/bin/curl 'http://127.0.0.1:9200' 2>&1 | /bin/grep -q 'Connection refused'
then
    {
        /bin/date
        echo
        echo "ElasticSearch fail and will be restarted..."
        /usr/bin/systemctl start elasticsearch.service
        /usr/bin/systemctl restart elasticsearch.service
    } > "$EMAIL_BODY" 2>&1

    /usr/bin/mail   -r "ElasticSearch Watch ${EMAIL_ADMIN}" \
                    -s "ElasticSearch was Restarted" "${EMAIL_SENDER}" \
                    -a "MIME-Version: 1.0" -a "Content-Type: text/html; charset=UTF-8" < "$EMAIL_BODY"
fi
sudo nano /usr/local/bin/"mlw-elasticsearch-watch-remote.sh"
#!/bin/bash -e

# @author    Spas Z. Spasov <spas.z.spasov@metalevel.tech>
# @copyright 2021 Spas Z. Spasov
# @license   https://www.gnu.org/licenses/gpl-3.0.html GNU General Public License, version 3 (or later)
#
# @home      https://wiki.metalevel.tech/wiki/Elasticsearch_and_MediaWiki_CirrusSearch
# @install   Create executable file within and place this code as its content:
#            /usr/local/bin/mlw-elasticsearch-watch-remote.sh
#
# @desc      Test wheather Elasticsearch is accessible, if not attempt to restar and send email notification
#            Here the test is done via SSH login to a remote instance where Elasticsearch is used

: ${EMAIL_SENDER:="admin@example.com"}                 # The email address of the reponsible person
: ${EMAIL_ADMIN:="admin@example.com"}                  # The email address which sends the email
: ${EMAIL_BODY:="/tmp/elasticsearch-watch.email.body"} # Temporary file where the email body will be stored
: ${HOSTNAME:="example.com"}                           # A hostname defined in the ssh/config file

if /usr/bin/ssh "$HOSTNAME" "curl 'http://127.0.0.1:9200' 2>&1" | /bin/grep -q 'Connection refused'
then
    {
        /bin/date
        echo
        echo "ElasticSearch on remote instance - ${HOSTNAME}, and will be restarted..."
        /usr/bin/systemctl start autossh-port-forward.service
        /usr/bin/systemctl start elasticsearch.service
        /usr/bin/systemctl restart autossh-trivictoria.service
        /usr/bin/systemctl restart elasticsearch.service
    } > "$EMAIL_BODY" 2>&1

    /usr/bin/mail   -r "ElasticSearch Watch ${EMAIL_ADMIN}" \
                    -s "ElasticSearch was Restarted" "${EMAIL_SENDER}" \
                    -a "MIME-Version: 1.0" -a "Content-Type: text/html; charset=UTF-8" < "$EMAIL_BODY"
fi

To run the test pe­ri­od­i­cal­ly you can cre­ate sim­ple sys­temd ser­vice and timer. A pret­ty sim­ple ex­am­ple of this ap­proach could be found with­in the Gunicorn's doc­u­men­ta­tion. An­oth­er way is to cre­ate a crontab en­try as the fol­low.

sudo crontab -e
# ElasticSearch Watch
*/5 * * * * /usr/local/bin/elasticsearch-watch.sh

Dis­able Cir­rusSearch via PHP if Elas­tic­search is not avail­able

An­oth­er thing that could be done in or­der to be sure you wi­ki works cor­rect is to check whether Elas­tic­search is avail­able with­in LocalSettings.php. This could be done by an im­ple­men­ta­tion of the fol­low­ing code.

<?php
function isPortOpen($ipAddress, $portToCheck) {
    $fp = @fsockopen($ipAddress, $portToCheck, $errno, $errstr, 0.1);
    if (!$fp) {
        return false;
    } else {
        fclose($fp);
        return true;
    }
}

if (isPortOpen('127.0.0.1', 9300)) {
    echo '9300 Open';
} else {
    echo '9300 Closed';
}
$wgSearchType = 'CirrusSearch';

The LocalSettings.php im­ple­men­ta­tion could look like:

if (@fsockopen('127.0.0.1', 9300, $errno, $errstr, 0.1)) {
fclose($fp);
$wgSearchType = 'CirrusSearch';
}

Ref­er­ences