MediaWiki Extension CirrusSearch and Elasticsearch Setup: Difference between revisions

From WikiMLT
Spas (talk | contribs)
Spas (talk | contribs)
Line 894: Line 894:
Open the configuration file with your favorite editor and place the following lines at suitable place (the end of the file is good place). In the example below is shown the current configuration of this wiki. After the building of the search index (next section) CirrusSearch should work without the advanced setup. More options are described in the [[mw:Extension:CirrusSearch#Configuration|Extension:CirrusSearch]] page, also some undocumented options could be found within its <code>[https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/CirrusSearch/+/HEAD/extension.json extension.json]</code> file.<syntaxhighlight lang="shell" line="1">
Open the configuration file with your favorite editor and place the following lines at suitable place (the end of the file is good place). In the example below is shown the current configuration of this wiki. After the building of the search index (next section) CirrusSearch should work without the advanced setup. More options are described in the [[mw:Extension:CirrusSearch#Configuration|Extension:CirrusSearch]] page, also some undocumented options could be found within its <code>[https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/CirrusSearch/+/HEAD/extension.json extension.json]</code> file.<syntaxhighlight lang="shell" line="1">
sudo nano "$IP/LocalSettings.php"
sudo nano "$IP/LocalSettings.php"
</syntaxhighlight><syntaxhighlight lang="php" line="1" start="750">
</syntaxhighlight>
<syntaxhighlight lang="php" line="1" start="750" class="mlw-pre-max-height-320">
## Extension:AdvancedSearch
## Extension:AdvancedSearch
wfLoadExtension( 'AdvancedSearch' );
wfLoadExtension( 'AdvancedSearch' );

Revision as of 15:49, 30 August 2022

This is a short man­u­al how to set-up Elas­tic­search to be used with the MediaWiki's ex­ten­sion Cir­rusSearch which com­mu­ni­cate to the ser­vice by the ex­ten­sion Elas­ti­ca. You should choice an ap­pro­pri­ate Elas­tic­search ver­sion de­pend­ing on your Me­di­aWi­ki ver­sion. Cur­rent­ly I'm us­ing Me­di­aWi­ki 1.38 and it is rec­om­mend­ed to use Elas­tic­search 6.8.23+ with it. This ver­sion runs well over open­jdk-11 which is the de­fault Ja­va ver­sion on Ubun­tu Serv­er 22.04.

Elas­tic­search and the ex­ten­sion Elas­ti­ca are re­quired by some oth­er Me­di­aWi­ki ex­ten­sions as ex­ten­sion Trans­late where it is used as trans­la­tion mem­o­ry. It is al­so used by the NextCoud's ap­pli­ca­tion Full text search and more…

Ja­va Set­up

On Ubun­tu Serv­er the de­fault jdk and jre pack­ages can be in­stalled by the fol­low­ing com­mand.

sudo apt install -y apt-transport-https default-jdk default-jre

To check and switch the cur­rent ver­sion of Ja­va and Javac you can use the fol­low­ing com­mands.

sudo update-alternatives --config java
#Out­put
There are 2 choices for the alternative java (providing /usr/bin/java).

  Selection    Path                                            Priority   Status
------------------------------------------------------------
  0            /usr/lib/jvm/java-11-openjdk-amd64/bin/java      1111      auto mode
* 1            /usr/lib/jvm/java-11-openjdk-amd64/bin/java      1111      manual mode
  2            /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java   1081      manual mode

Press <enter> to keep the current choice[*], or type selection number: 1
sudo update-alternatives --config javac
#Out­put
There are 2 choices for the alternative javac (providing /usr/bin/javac).

  Selection    Path                                          Priority   Status
------------------------------------------------------------
  0            /usr/lib/jvm/java-11-openjdk-amd64/bin/javac   1111      auto mode
* 1            /usr/lib/jvm/java-11-openjdk-amd64/bin/javac   1111      manual mode
  2            /usr/lib/jvm/java-8-openjdk-amd64/bin/javac    1081      manual mode

Press <enter> to keep the current choice[*], or type selection number: 1

If you are us­ing Elas­tic­search 5.x it re­quires openjdk‑8 which can be in­stalled by the fol­low­ing com­mands. Af­ter the in­stal­la­tion use the above com­mands to switch the ver­sion in use.

#De­tails
sudo apt install openjdk-8-jre-headless 
sudo apt install openjdk-8-jdk-headless

Af­ter switch­ing the ver­sion of Ja­va you need to restart the Elas­tic­search ser­vice if it is al­ready in­stalled.

sudo systemctl restart elasticsearch.service 
curl 'http://127.0.0.1:9200' # do a test

Elas­tic­search Set­up

In­stal­la­tion

There is a cou­ple of ways how to In­stalling Elas­tic­search – via Dock­er, via Apt repos­i­to­ry, via .deb or .rpm pack­ages, etc. I pre­fer to man­u­al­ly down­load and in­stall it via .deb pack­age. Is I said be­fore for Me­di­aWi­ki 1.38 we need ver­sion 6.8.23+.

cd ~/Downloads
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.8.23.deb
sudo apt install ./elasticsearch-6.8.23.deb
#Ver­sions
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.16.deb
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.5.4.deb
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.0.1-amd64.deb

Af­ter in­stalling the pack­age the Elas­tic­search ser­vice must be en­abled and start­ed.

sudo systemctl enable --now elasticsearch.service   # enable and start the service
systemctl status elasticsearch.service              # check the status of the service
systemctl cat elasticsearch.service                 # check the current service's configuration

Check

You can check does the ser­vice work prop­er­ly by the fol­low­ing ap­proach.

curl 'http://127.0.0.1:9200'
#Out­put
{
  "name" : "W2uxKNc",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "JwkNoPi_THuiCA123-HKMg",
  "version" : {
    "number" : "6.8.23",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "4f67856",
    "build_date" : "2022-01-06T21:30:50.087716Z",
    "build_snapshot" : false,
    "lucene_version" : "7.7.3",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

More de­tailed in­for­ma­tion can be ob­tained by the next com­mand.

curl -XGET 'http://localhost:9200/_nodes?pretty'
#Out­put
{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "elasticsearch",
  "nodes" : {
    "W2uxKNc9SQqZSVN4RIZmNg" : {
      "name" : "W2uxKNc",
      "transport_address" : "127.0.0.1:9300",
      "host" : "127.0.0.1",
      "ip" : "127.0.0.1",
      "version" : "6.8.23",
      "build_flavor" : "default",
      "build_type" : "deb",
      "build_hash" : "4f67856",
      "total_indexing_buffer" : 418159001,
      "roles" : [
        "master",
        "data",
        "ingest"
      ],
      "attributes" : {
        "ml.machine_memory" : "24456527872",
        "xpack.installed" : "true",
        "ml.max_open_jobs" : "20",
        "ml.enabled" : "true"
      },
      "settings" : {
        "pidfile" : "/var/run/elasticsearch/elasticsearch.pid",
        "cluster" : {
          "name" : "elasticsearch"
        },
        "node" : {
          "attr" : {
            "xpack" : {
              "installed" : "true"
            },
            "ml" : {
              "machine_memory" : "24456527872",
              "max_open_jobs" : "20",
              "enabled" : "true"
            }
          },
          "name" : "W2uxKNc"
        },
        "path" : {
          "data" : [
            "/var/lib/elasticsearch"
          ],
          "logs" : "/var/log/elasticsearch",
          "home" : "/usr/share/elasticsearch"
        },
        "client" : {
          "type" : "node"
        },
        "http" : {
          "type" : "security4",
          "type.default" : "netty4"
        },
        "transport" : {
          "type" : "security4",
          "features" : {
            "x-pack" : "true"
          },
          "type.default" : "netty4"
        }
      },
      "os" : {
        "refresh_interval_in_millis" : 1000,
        "name" : "Linux",
        "pretty_name" : "Ubuntu 22.04.1 LTS",
        "arch" : "amd64",
        "version" : "5.15.0-46-generic",
        "available_processors" : 16,
        "allocated_processors" : 16
      },
      "process" : {
        "refresh_interval_in_millis" : 1000,
        "id" : 1041,
        "mlockall" : false
      },
      "jvm" : {
        "pid" : 1041,
        "version" : "11.0.16",
        "vm_name" : "OpenJDK 64-Bit Server VM",
        "vm_version" : "11.0.16+8-post-Ubuntu-0ubuntu122.04",
        "vm_vendor" : "Ubuntu",
        "start_time_in_millis" : 1661769755777,
        "mem" : {
          "heap_init_in_bytes" : 4294967296,
          "heap_max_in_bytes" : 4181590016,
          "non_heap_init_in_bytes" : 7667712,
          "non_heap_max_in_bytes" : 0,
          "direct_max_in_bytes" : 0
        },
        "gc_collectors" : [
          "ParNew",
          "ConcurrentMarkSweep"
        ],
        "memory_pools" : [
          "CodeHeap 'non-nmethods'",
          "Metaspace",
          "CodeHeap 'profiled nmethods'",
          "Compressed Class Space",
          "Par Eden Space",
          "Par Survivor Space",
          "CodeHeap 'non-profiled nmethods'",
          "CMS Old Gen"
        ],
        "using_compressed_ordinary_object_pointers" : "true",
        "input_arguments" : [
          "-Xms4g",
          "-Xmx4g",
          "-XX:+UseConcMarkSweepGC",
          "-XX:CMSInitiatingOccupancyFraction=75",
          "-XX:+UseCMSInitiatingOccupancyOnly",
          "-Des.networkaddress.cache.ttl=60",
          "-Des.networkaddress.cache.negative.ttl=10",
          "-XX:+AlwaysPreTouch",
          "-Xss1m",
          "-Djava.awt.headless=true",
          "-Dfile.encoding=UTF-8",
          "-Djna.nosys=true",
          "-XX:-OmitStackTraceInFastThrow",
          "-Dio.netty.noUnsafe=true",
          "-Dio.netty.noKeySetOptimization=true",
          "-Dio.netty.recycler.maxCapacityPerThread=0",
          "-Dlog4j.shutdownHookEnabled=false",
          "-Dlog4j2.disable.jmx=true",
          "-Dlog4j2.formatMsgNoLookups=true",
          "-Djava.io.tmpdir=/tmp/elasticsearch-14060835447651286248",
          "-XX:+HeapDumpOnOutOfMemoryError",
          "-XX:HeapDumpPath=/var/lib/elasticsearch",
          "-XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log",
          "-Djava.locale.providers=COMPAT",
          "-XX:UseAVX=2",
          "-Des.path.home=/usr/share/elasticsearch",
          "-Des.path.conf=/etc/elasticsearch",
          "-Des.distribution.flavor=default",
          "-Des.distribution.type=deb"
        ]
      },
      "thread_pool" : {
        "watcher" : {
          "type" : "fixed",
          "min" : 50,
          "max" : 50,
          "queue_size" : 1000
        },
        "force_merge" : {
          "type" : "fixed",
          "min" : 1,
          "max" : 1,
          "queue_size" : -1
        },
        "security-token-key" : {
          "type" : "fixed",
          "min" : 1,
          "max" : 1,
          "queue_size" : 1000
        },
        "ml_datafeed" : {
          "type" : "fixed",
          "min" : 20,
          "max" : 20,
          "queue_size" : 200
        },
        "fetch_shard_started" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 32,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "listener" : {
          "type" : "fixed",
          "min" : 8,
          "max" : 8,
          "queue_size" : -1
        },
        "ml_autodetect" : {
          "type" : "fixed",
          "min" : 80,
          "max" : 80,
          "queue_size" : 80
        },
        "index" : {
          "type" : "fixed",
          "min" : 16,
          "max" : 16,
          "queue_size" : 200
        },
        "refresh" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 8,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "generic" : {
          "type" : "scaling",
          "min" : 4,
          "max" : 128,
          "keep_alive" : "30s",
          "queue_size" : -1
        },
        "rollup_indexing" : {
          "type" : "fixed",
          "min" : 4,
          "max" : 4,
          "queue_size" : 4
        },
        "warmer" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 5,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "search" : {
          "type" : "fixed_auto_queue_size",
          "min" : 25,
          "max" : 25,
          "queue_size" : 1000
        },
        "ccr" : {
          "type" : "fixed",
          "min" : 32,
          "max" : 32,
          "queue_size" : 100
        },
        "flush" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 5,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "fetch_shard_store" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 32,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "management" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 5,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "ml_utility" : {
          "type" : "fixed",
          "min" : 80,
          "max" : 80,
          "queue_size" : 500
        },
        "get" : {
          "type" : "fixed",
          "min" : 16,
          "max" : 16,
          "queue_size" : 1000
        },
        "analyze" : {
          "type" : "fixed",
          "min" : 1,
          "max" : 1,
          "queue_size" : 16
        },
        "write" : {
          "type" : "fixed",
          "min" : 16,
          "max" : 16,
          "queue_size" : 200
        },
        "snapshot" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 5,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "search_throttled" : {
          "type" : "fixed_auto_queue_size",
          "min" : 1,
          "max" : 1,
          "queue_size" : 100
        }
      },
      "transport" : {
        "bound_address" : [
          "[::1]:9300",
          "127.0.0.1:9300"
        ],
        "publish_address" : "127.0.0.1:9300",
        "profiles" : { }
      },
      "http" : {
        "bound_address" : [
          "[::1]:9200",
          "127.0.0.1:9200"
        ],
        "publish_address" : "127.0.0.1:9200",
        "max_content_length_in_bytes" : 104857600
      },
      "plugins" : [ ],
      "modules" : [
        {
          "name" : "aggs-matrix-stats",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Adds aggregations whose input are a list of numeric fields and output includes a matrix.",
          "classname" : "org.elasticsearch.search.aggregations.matrix.MatrixAggregationPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "analysis-common",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Adds \"built in\" analyzers to Elasticsearch.",
          "classname" : "org.elasticsearch.analysis.common.CommonAnalysisPlugin",
          "extended_plugins" : [
            "lang-painless"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "ingest-common",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Module for ingest processors that do not require additional security permissions or have large dependencies and resources",
          "classname" : "org.elasticsearch.ingest.common.IngestCommonPlugin",
          "extended_plugins" : [
            "lang-painless"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "ingest-geoip",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Ingest processor that uses looksup geo data based on ip adresses using the Maxmind geo database",
          "classname" : "org.elasticsearch.ingest.geoip.IngestGeoIpPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "ingest-user-agent",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Ingest processor that extracts information from a user agent",
          "classname" : "org.elasticsearch.ingest.useragent.IngestUserAgentPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "lang-expression",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Lucene expressions integration for Elasticsearch",
          "classname" : "org.elasticsearch.script.expression.ExpressionPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "lang-mustache",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Mustache scripting integration for Elasticsearch",
          "classname" : "org.elasticsearch.script.mustache.MustachePlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "lang-painless",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "An easy, safe and fast scripting language for Elasticsearch",
          "classname" : "org.elasticsearch.painless.PainlessPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "mapper-extras",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Adds advanced field mappers",
          "classname" : "org.elasticsearch.index.mapper.MapperExtrasPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "parent-join",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "This module adds the support parent-child queries and aggregations",
          "classname" : "org.elasticsearch.join.ParentJoinPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "percolator",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Percolator module adds capability to index queries and query these queries by specifying documents",
          "classname" : "org.elasticsearch.percolator.PercolatorPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "rank-eval",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "The Rank Eval module adds APIs to evaluate ranking quality.",
          "classname" : "org.elasticsearch.index.rankeval.RankEvalPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "reindex",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "The Reindex module adds APIs to reindex from one index to another or update documents in place.",
          "classname" : "org.elasticsearch.index.reindex.ReindexPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "repository-url",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Module for URL repository",
          "classname" : "org.elasticsearch.plugin.repository.url.URLRepositoryPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "transport-netty4",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Netty 4 based transport implementation",
          "classname" : "org.elasticsearch.transport.Netty4Plugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "tribe",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Tribe module",
          "classname" : "org.elasticsearch.tribe.TribePlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-ccr",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - CCR",
          "classname" : "org.elasticsearch.xpack.ccr.Ccr",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-core",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Core",
          "classname" : "org.elasticsearch.xpack.core.XPackPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-deprecation",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Deprecation",
          "classname" : "org.elasticsearch.xpack.deprecation.Deprecation",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-graph",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Graph",
          "classname" : "org.elasticsearch.xpack.graph.Graph",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-ilm",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Index Lifecycle Management",
          "classname" : "org.elasticsearch.xpack.indexlifecycle.IndexLifecycle",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-logstash",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Logstash",
          "classname" : "org.elasticsearch.xpack.logstash.Logstash",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-ml",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Machine Learning",
          "classname" : "org.elasticsearch.xpack.ml.MachineLearning",
          "extended_plugins" : [
            "x-pack-core",
            "lang-painless"
          ],
          "has_native_controller" : true
        },
        {
          "name" : "x-pack-monitoring",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Monitoring",
          "classname" : "org.elasticsearch.xpack.monitoring.Monitoring",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-rollup",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Rollup",
          "classname" : "org.elasticsearch.xpack.rollup.Rollup",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-security",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Security",
          "classname" : "org.elasticsearch.xpack.security.Security",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-sql",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "The Elasticsearch plugin that powers SQL for Elasticsearch",
          "classname" : "org.elasticsearch.xpack.sql.plugin.SqlPlugin",
          "extended_plugins" : [
            "x-pack-core",
            "lang-painless"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-upgrade",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Upgrade",
          "classname" : "org.elasticsearch.xpack.upgrade.Upgrade",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-watcher",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Watcher",
          "classname" : "org.elasticsearch.xpack.watcher.Watcher",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        }
      ],
      "ingest" : {
        "processors" : [
          {
            "type" : "append"
          },
          {
            "type" : "bytes"
          },
          {
            "type" : "convert"
          },
          {
            "type" : "date"
          },
          {
            "type" : "date_index_name"
          },
          {
            "type" : "dissect"
          },
          {
            "type" : "dot_expander"
          },
          {
            "type" : "drop"
          },
          {
            "type" : "fail"
          },
          {
            "type" : "foreach"
          },
          {
            "type" : "geoip"
          },
          {
            "type" : "grok"
          },
          {
            "type" : "gsub"
          },
          {
            "type" : "join"
          },
          {
            "type" : "json"
          },
          {
            "type" : "kv"
          },
          {
            "type" : "lowercase"
          },
          {
            "type" : "pipeline"
          },
          {
            "type" : "remove"
          },
          {
            "type" : "rename"
          },
          {
            "type" : "script"
          },
          {
            "type" : "set"
          },
          {
            "type" : "set_security_user"
          },
          {
            "type" : "sort"
          },
          {
            "type" : "split"
          },
          {
            "type" : "trim"
          },
          {
            "type" : "uppercase"
          },
          {
            "type" : "urldecode"
          },
          {
            "type" : "user_agent"
          }
        ]
      }
    }
  }
}

Tweaks

Elas­tic­search could use huge amount of RAM. But, I've test­ed it for thin in­stances it work even with on­ly 128m. The main con­fig­u­ra­tion files are lo­cat­ed in­to the di­rec­to­ry /​​​etc/​​​elasticsearch/​​​. You can tweak the amount of Ram in use by tweak­ing the rel­e­vant lines in the file jvm.options. Note Xms and Xmx val­ues must be equal.

sudo nano /etc/elasticsearch/jvm.options
# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space

#-Xms512m
#-Xmx512m
-Xms4g
-Xmx4g

Add restart al­ways di­rec­tive to the Elasticsearch's sys­temd unit.

sudo systemctl edit elasticsearch.service
[Service]
# SZS/MLT Tweak
Restart=always
RestartSec=3

To ap­ply the changes use the fol­low­ing com­mands.

sudo systemctl daemon-reload
sudo systemctl restart elasticsearch.service
systemctl status elasticsearch.service
systemctl cat elasticsearch.service

Me­di­aWi­ki Set­up

The main pur­pose of this guide is how to set­up Elas­tic­search to be used by MediaWiki's ex­ten­sion Cir­rusSearch, so in this sec­tion we will de­scribe how to do that. In ad­di­tion al­so the ex­ten­sion Ad­vanced­Search will be in­stalled and con­fig­ured.

If you have in­stalled the ex­ten­sion PdfHan­dler (or some oth­er file han­dling ex­ten­sion) Cir­rusSearch will show re­sults from the files con­tent – in the con­fig­u­ra­tion be­low is shown how to boost these re­sults. How to con­fig­ure ex­ten­sion Trans­late to use Elas­tic­search is de­cried in the MediaWiki's doc­u­men­ta­tion in the ar­ti­cle Trans­la­tion mem­o­ries.

In­stall the Ex­ten­sions

First of all you need to in­stall the ex­ten­sions with­in the MediaWiki's doc­u­ment root. In the fol­low­ing ex­am­ple is used the ap­proach Down­load from Git.

: ${IP:="/var/www/wiki.example.com"} # The DocumentRoot directory of the wiki
: ${OWNER:="www-data"}               # The user that owns the $IP directory
: ${BRANCH:="REL1_38"}               # The MediaWiki's branch in use
cd "$IP/extensions"
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/AdvancedSearch --branch ${BRANCH}
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/Elastica --branch ${BRANCH}
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/CirrusSearch --branch ${BRANCH}
sudo chown -R ${Owner}:${Owner} Elastica/ CirrusSearch/
for ext in Elastica CirrusSearch; do sudo -u ${Owner} composer install --no-dev; done

LocalSettings.php Con­fig­u­ra­tion

Open the con­fig­u­ra­tion file with your fa­vorite ed­i­tor and place the fol­low­ing lines at suit­able place (the end of the file is good place). In the ex­am­ple be­low is shown the cur­rent con­fig­u­ra­tion of this wi­ki. Af­ter the build­ing of the search in­dex (next sec­tion) Cir­rusSearch should work with­out the ad­vanced set­up. More op­tions are de­scribed in the Extension:CirrusSearch page, al­so some un­doc­u­ment­ed op­tions could be found with­in its extension.json file.

sudo nano "$IP/LocalSettings.php"
## Extension:AdvancedSearch
wfLoadExtension( 'AdvancedSearch' );
$wgAdvancedSearchDeepcatEnabled = false;    // https://www.mediawiki.org/wiki/Topic:Uw036nwsilvb6w3t
$wgAdvancedSearchBetaFeature = false;       // (enable it by default) https://m.mediawiki.org/wiki/Topic:Upflskaswcvrunka
$wgAdvancedSearchHighlighting = true;       // https://www.mediawiki.org/wiki/Manual:Configuration_settings_(alphabetical)
$wgOpenSearchDescriptionLength = 2500;      // https://www.mediawiki.org/wiki/Manual:$wgOpenSearchDescriptionLength

## Extension:Elastica
wfLoadExtension( 'Elastica' );

## Extension:CirrusSearch
wfLoadExtension( 'CirrusSearch' );
// $wgDisableSearchUpdate = true;
$wgSearchType = 'CirrusSearch';
$wgDebugLogGroups['CirrusSearch'] = "$IP/cache/CirrusSearch.log";
// $wgCirrusSearchIndexBaseName = 'wiki_db_name';       // https://www.mediawiki.org/wiki/Extension:CirrusSearch#Configuration
// $wgCirrusSearchServers = [ '10.120.201.1' ];         // The address of the Elasticsearch serer if it is not available at 'localhost'

## Extension:CirrusSearch Advanced Setup
$wgCirrusSearchRescoreProfile = 'classic_noboostlinks';
// $wgCirrusSearchFullTextQueryBuilderProfiles = 'perfield_builder';
// $wgCirrusSearchCompletionProfiles = 'normal';
$wgCirrusSearchPhraseSuggestUseText = true;
$wgCirrusSearchCompletionSuggesterHardLimit = 200; // 50
$wgCirrusSearchFragmentSize = 200;
// $wgCirrusExploreSimilarResults = true;

// Give much weight to the "file_text" in order to show 
// results from the PDFs content. This requires PdfHandler
$wgCirrusSearchWeights = [
    "title" => 20,
    "redirect" => 15,
    "category" => 8,
    "heading" => 5,
    "opening_text" => 3,
    "text" => 5,
    "auxiliary_text" => 15,
    "file_text" => 25
];

// https://www.mediawiki.org/wiki/Help:Namespaces#Localisation
$wgCirrusSearchNamespaceWeights = [
    "2" => 0.05,
    "4" => 0.3,
    "6" => 0.2,
    "8" => 0.05,
    "10" => 0.005,
    "12" => 0.2,
    "14" => 0.1
];

Build Search In­dex

How to build and up­date the CirrusSearch/​​​Elasticsearc in­dex is well de­scribed in the doc­u­ments README and UP­GRADE which comes with the ex­ten­sion. Here the im­por­tant steps re­lat­ed to the in­dex build­ing for a first time are ex­tract­ed and con­vert­ed to a script.

mlw-cirrussearch-elasticsearch-build-index-single-wiki.sh
#!/bin/bash

# @author    Spas Z. Spasov <spas.z.spasov@metalevel.tech>
# @copyright 2022 Spas Z. Spasov
# @license   https://www.gnu.org/licenses/gpl-3.0.html GNU General Public License, version 3 (or later)
#
# @desc      Create elastic search index for a singel MediaWiki
#
# @install   Create executable file within /usr/local/bin/ and place this code as its content
# @source    https://phabricator.wikimedia.org/source/extension-cirrussearch/browse/master/README

: ${IP:="/var/www/wiki.example.com"} # The DocumentRoot directory of the wiki
: ${OWNER:="www-data"}               # The user that owns the $IP directory

# STEP 0
printf -- '\n**\nDisable Cirrus Search for %s ------------\n*\n\n' "${IP##*/}"
sudo -u "$OWNER" sed -i 's#^$wgSearchType#// $wgSearchType#' $IP/LocalSettings.php
sudo -u "$OWNER" sed -i 's#^// $wgDisableSearchUpdate#$wgDisableSearchUpdate#' $IP/LocalSettings.php
echo -e "\n\n**\n$IP/LocalSettings.php\n*\n"
sudo -u "$OWNER" grep '$wgSearchType\|$wgDisableSearchUpdate = true' $IP/LocalSettings.php
echo
sleep 5

# STEP 1
printf -- '\n**\nGenerate ElasticSearch Index for %s ------------\n*\n\n' "${IP##*/}"
sudo -u "$OWNER" /usr/bin/php $IP/extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --startOver --conf $IP/LocalSettings.php
echo
sleep 5

# STEP 2
sudo -u "$OWNER" sed -i 's#^$wgDisableSearchUpdate#// $wgDisableSearchUpdate#' $IP/LocalSettings.php
sudo -u "$OWNER" grep '$wgSearchType\|$wgDisableSearchUpdate = true' $IP/LocalSettings.php
echo
sleep 5
printf -- '\n**\n Bootstrap the Search Index for %s ------------\n*\n\n' "${IP##*/}"
sudo -u "$OWNER" /usr/bin/php $IP/extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipLinks --indexOnSkip --conf $IP/LocalSettings.php
sudo -u "$OWNER" /usr/bin/php $IP/extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipParse --conf $IP/LocalSettings.php

# STEP 3
sleep 5
printf -- '\n**\nEnable Cirrus Search for %s ------------\n*\n\n' "${IP##*/}"
sudo -u "$OWNER" sed -i 's#^// $wgSearchType#$wgSearchType#' $IP/LocalSettings.php
echo -e '\n\n**\n$IP/LocalSettings.php\n*\n'
sudo -u "$OWNER" grep '$wgSearchType\|$wgDisableSearchUpdate = true' $IP/LocalSettings.php
echo

# Step 4
sleep 5
printf -- '\n**\nUpdate Cirrus Search Suggestions for %s ------------\n*\n\n' "${IP##*/}"
sudo -u "$OWNER" /usr/bin/php $IP/extensions/CirrusSearch/maintenance/UpdateSuggesterIndex.php --conf $IP/LocalSettings.php

За да за­поч­не ре­гу­ляр­но ин­дек­си­ра­не на съ­дър­жа­ни­е­то на уики­то, спря­мо кон­фи­гу­ра­ци­я­та, напра­ве­на в /var/­www/­*/­Local­Sett­ings.php и до­ку­мен­та­ци­я­та на mw:Extension­:­CirrusSearch тряб­ва да напра­ви пър­во­на­чал­на ин­дек­са­ция, да се из­пъл­нят за­да­чи­те, ко­и­то ще съз­да­де тя, да се ре­ге­не­ри­ра ин­дек­са на съ­дър­жа­ни­е­то и от­но­во да се из­праз­ни опаш­ка­та със за­да­чи­те. За цел­та мо­гат да се из­пол­з­ват скрип­то­ве­те за под­дръж­ка, опи­са­ни в сек­ци­я­та Me­di­aWi­ki.

mw-maintenance-elasticsearch-index.sh
mw-maintenance-runJobs.sh cli
mw-maintenance-rebuildAll.sh
mw-maintenance-runJobs.sh cli

Ad­di­tion­al Set­up

Ac­cess Elas­tic­search via SSH Tun­nel

Us­ing such ap­proach is suit­able on­ly for test pur­pose, here is a man­u­al how to set-up:

Elas­tic­search Watch Scripts

В до­пъл­не­ние е раз­ра­бо­тен скрип­та elasticsearch​-watch​.sh, ка­то чрез crontab за­да­ча се пра­ви пе­ри­о­дич­на про­вер­ка и при не­об­хо­ди­мост рес­тар­ти­ра­не. Скрип­та из­пра­ща пис­мо до vectoria@altclavis.com, ако настъ­пи съ­би­тие.

sudo crontab -e
# ElasticSearch Watch
*/5 * * * * /usr/local/bin/elasticsearch-watch.sh

Ref­er­ences