MediaWiki Extension CirrusSearch and Elasticsearch Setup: Difference between revisions

From WikiMLT
Line 870: Line 870:


== MediaWiki Setup ==
== MediaWiki Setup ==
The main purpose of this guide is how to setup Elasticsearch to be used by MediaWiki's extension [[mw:Extension:CirrusSearch|'''CirrusSearch''']], so in this section we will describe how to do that. How to configure extension Translate to use Elasticsearch is decried in the MediaWiki's documentation in the article [[mw:Help:Extension:Translate/Translation memories#ElasticSearch%20backend|Translation memories]].
The main purpose of this guide is how to setup Elasticsearch to be used by MediaWiki's extension [[mw:Extension:CirrusSearch|'''CirrusSearch''']], so in this section we will describe how to do that. In addition also the extension [[mw:Extension:AdvancedSearch|AdvancedSearch]] will be installed and configured. How to configure extension Translate to use Elasticsearch is decried in the MediaWiki's documentation in the article [[mw:Help:Extension:Translate/Translation memories#ElasticSearch%20backend|Translation memories]].


=== Install the Extensions ===
First of all you need to install the extensions within the MediaWiki's document root. In the following example is used the approach [[mw:Download from Git|Download from Git]].
First of all you need to install the extensions within the MediaWiki's document root. In the following example is used the approach [[mw:Download from Git|Download from Git]].


Line 881: Line 882:
<syntaxhighlight lang="shell" line="1">
<syntaxhighlight lang="shell" line="1">
cd "$IP/extensions"
cd "$IP/extensions"
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/AdvancedSearch --branch ${BRANCH}
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/Elastica --branch ${BRANCH}
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/Elastica --branch ${BRANCH}
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/CirrusSearch --branch ${BRANCH}
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/CirrusSearch --branch ${BRANCH}
Line 886: Line 888:
for ext in Elastica CirrusSearch; do sudo -u ${Owner} composer install --no-dev; done
for ext in Elastica CirrusSearch; do sudo -u ${Owner} composer install --no-dev; done
</syntaxhighlight>
</syntaxhighlight>
=== LocalSettings.php Configuration ===
Open the configuration file with your favorite editor and place the following lines at suitable place (the end of the file is good place). In the example below is shown the current configuration of this wiki. After the building of the search index (next section) CirrusSearch should work without the advanced setup.<syntaxhighlight lang="shell" line="1">
sudo nano "$IP/LocalSettings.php"
</syntaxhighlight><syntaxhighlight lang="php" line="1" start="750">
<?php
## Extension:AdvancedSearch
wfLoadExtension( 'AdvancedSearch' );
$wgAdvancedSearchDeepcatEnabled = false;    // https://www.mediawiki.org/wiki/Topic:Uw036nwsilvb6w3t
$wgAdvancedSearchBetaFeature = false;      // (enable it by default) https://m.mediawiki.org/wiki/Topic:Upflskaswcvrunka
$wgAdvancedSearchHighlighting = true;      // https://www.mediawiki.org/wiki/Manual:Configuration_settings_(alphabetical)
$wgOpenSearchDescriptionLength = 2500;      // https://www.mediawiki.org/wiki/Manual:$wgOpenSearchDescriptionLength
## Extension:Elastica
wfLoadExtension( 'Elastica' );
## Extension:CirrusSearch
wfLoadExtension( 'CirrusSearch' );
$wgCirrusSearchRescoreProfile = 'classic_noboostlinks';
// $wgDisableSearchUpdate = true;
$wgSearchType = 'CirrusSearch';
$wgDebugLogGroups['CirrusSearch'] = "$IP/cache/CirrusSearch.log";
$wgCirrusSearchIndexBaseName = 'wiki_mlt_mlw';
// $wgCirrusSearchServers = [ '10.12.201.1' ];
// $wgCirrusExploreSimilarResults = true;
## Extension:CirrusSearch Advanced Setup
$wgCirrusSearchPhraseSuggestUseText = true;
$wgCirrusSearchCompletionSuggesterHardLimit = 200; // 50
$wgCirrusSearchFragmentSize = 200;
// $wgCirrusSearchFullTextQueryBuilderProfiles = 'perfield_builder';
// $wgCirrusSearchCompletionProfiles = 'normal';
$wgCirrusSearchNamespaceWeights = [
    "2" => 0.05,
    "4" => 0.3,
    "6" => 0.2,
    "8" => 0.05,
    "10" => 0.005,
    "12" => 0.2,
    "14" => 0.1
];  // https://www.mediawiki.org/wiki/Help:Namespaces#Localisation
$wgCirrusSearchWeights = [
    "title" => 20,
    "redirect" => 15,
    "category" => 8,
    "heading" => 5,
    "opening_text" => 3,
    "text" => 5,
    "auxiliary_text" => 15,
    "file_text" => 25
];
</syntaxhighlight>
=== Build Search Index ===
За да започне регулярно индексиране на съдържанието на уикито, спрямо конфигурацията, направена в <code>/var/&shy;www/&shy;*/&shy;Local&shy;Sett&shy;ings.php</code> и документацията на [[Mw:Extension:CirrusSearch|mw:Extension&shy;:&shy;CirrusSearch]] трябва да направи първоначална индексация, да се изпълнят задачите, които ще създаде тя, да се регенерира индекса на съдържанието и отново да се изпразни опашката със задачите. За целта могат да се използват скриптовете за поддръжка, описани в секцията MediaWiki.<syntaxhighlight lang="shell" line="1">
За да започне регулярно индексиране на съдържанието на уикито, спрямо конфигурацията, направена в <code>/var/&shy;www/&shy;*/&shy;Local&shy;Sett&shy;ings.php</code> и документацията на [[Mw:Extension:CirrusSearch|mw:Extension&shy;:&shy;CirrusSearch]] трябва да направи първоначална индексация, да се изпълнят задачите, които ще създаде тя, да се регенерира индекса на съдържанието и отново да се изпразни опашката със задачите. За целта могат да се използват скриптовете за поддръжка, описани в секцията MediaWiki.<syntaxhighlight lang="shell" line="1">
mw-maintenance-elasticsearch-index.sh
mw-maintenance-elasticsearch-index.sh

Revision as of 14:40, 30 August 2022

This is a short man­u­al how to set-up Elas­tic­search to be used with the MediaWiki's ex­ten­sion Cir­rusSearch which com­mu­ni­cate to the ser­vice by the ex­ten­sion Elas­ti­ca. You should choice an ap­pro­pri­ate Elas­tic­search ver­sion de­pend­ing on your Me­di­aWi­ki ver­sion. Cur­rent­ly I'm us­ing Me­di­aWi­ki 1.38 and it is rec­om­mend­ed to use Elas­tic­search 6.8.23+ with it. This ver­sion runs well over open­jdk-11 which is the de­fault Ja­va ver­sion on Ubun­tu Serv­er 22.04.

Elas­tic­search and the ex­ten­sion Elas­ti­ca are re­quired by some oth­er Me­di­aWi­ki ex­ten­sions as ex­ten­sion Trans­late where it is used as trans­la­tion mem­o­ry. It is al­so used by the NextCoud's ap­pli­ca­tion Full text search and more…

Ja­va Set­up

On Ubun­tu Serv­er the de­fault jdk and jre pack­ages can be in­stalled by the fol­low­ing com­mand.

sudo apt install -y apt-transport-https default-jdk default-jre

To check and switch the cur­rent ver­sion of Ja­va and Javac you can use the fol­low­ing com­mands.

sudo update-alternatives --config java
#Out­put
There are 2 choices for the alternative java (providing /usr/bin/java).

  Selection    Path                                            Priority   Status
------------------------------------------------------------
  0            /usr/lib/jvm/java-11-openjdk-amd64/bin/java      1111      auto mode
* 1            /usr/lib/jvm/java-11-openjdk-amd64/bin/java      1111      manual mode
  2            /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java   1081      manual mode

Press <enter> to keep the current choice[*], or type selection number: 1
sudo update-alternatives --config javac
#Out­put
There are 2 choices for the alternative javac (providing /usr/bin/javac).

  Selection    Path                                          Priority   Status
------------------------------------------------------------
  0            /usr/lib/jvm/java-11-openjdk-amd64/bin/javac   1111      auto mode
* 1            /usr/lib/jvm/java-11-openjdk-amd64/bin/javac   1111      manual mode
  2            /usr/lib/jvm/java-8-openjdk-amd64/bin/javac    1081      manual mode

Press <enter> to keep the current choice[*], or type selection number: 1

If you are us­ing Elas­tic­search 5.x it re­quires openjdk‑8 which can be in­stalled by the fol­low­ing com­mands. Af­ter the in­stal­la­tion use the above com­mands to switch the ver­sion in use.

#De­tails
sudo apt install openjdk-8-jre-headless 
sudo apt install openjdk-8-jdk-headless

Af­ter switch­ing the ver­sion of Ja­va you need to restart the Elas­tic­search ser­vice if it is al­ready in­stalled.

sudo systemctl restart elasticsearch.service 
curl 'http://127.0.0.1:9200' # do a test

Elas­tic­search Set­up

In­stal­la­tion

There is a cou­ple of ways how to In­stalling Elas­tic­search – via Dock­er, via Apt repos­i­to­ry, via .deb or .rpm pack­ages, etc. I pre­fer to man­u­al­ly down­load and in­stall it via .deb pack­age. Is I said be­fore for Me­di­aWi­ki 1.38 we need ver­sion 6.8.23+.

cd ~/Downloads
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.8.23.deb
sudo apt install ./elasticsearch-6.8.23.deb
#Ver­sions
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.16.deb
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.5.4.deb
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.0.1-amd64.deb

Af­ter in­stalling the pack­age the Elas­tic­search ser­vice must be en­abled and start­ed.

sudo systemctl enable --now elasticsearch.service   # enable and start the service
systemctl status elasticsearch.service              # check the status of the service
systemctl cat elasticsearch.service                 # check the current service's configuration

Check

You can check does the ser­vice work prop­er­ly by the fol­low­ing ap­proach.

curl 'http://127.0.0.1:9200'
#Out­put
{
  "name" : "W2uxKNc",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "JwkNoPi_THuiCA123-HKMg",
  "version" : {
    "number" : "6.8.23",
    "build_flavor" : "default",
    "build_type" : "deb",
    "build_hash" : "4f67856",
    "build_date" : "2022-01-06T21:30:50.087716Z",
    "build_snapshot" : false,
    "lucene_version" : "7.7.3",
    "minimum_wire_compatibility_version" : "5.6.0",
    "minimum_index_compatibility_version" : "5.0.0"
  },
  "tagline" : "You Know, for Search"
}

More de­tailed in­for­ma­tion can be ob­tained by the next com­mand.

curl -XGET 'http://localhost:9200/_nodes?pretty'
#Out­put
{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "cluster_name" : "elasticsearch",
  "nodes" : {
    "W2uxKNc9SQqZSVN4RIZmNg" : {
      "name" : "W2uxKNc",
      "transport_address" : "127.0.0.1:9300",
      "host" : "127.0.0.1",
      "ip" : "127.0.0.1",
      "version" : "6.8.23",
      "build_flavor" : "default",
      "build_type" : "deb",
      "build_hash" : "4f67856",
      "total_indexing_buffer" : 418159001,
      "roles" : [
        "master",
        "data",
        "ingest"
      ],
      "attributes" : {
        "ml.machine_memory" : "24456527872",
        "xpack.installed" : "true",
        "ml.max_open_jobs" : "20",
        "ml.enabled" : "true"
      },
      "settings" : {
        "pidfile" : "/var/run/elasticsearch/elasticsearch.pid",
        "cluster" : {
          "name" : "elasticsearch"
        },
        "node" : {
          "attr" : {
            "xpack" : {
              "installed" : "true"
            },
            "ml" : {
              "machine_memory" : "24456527872",
              "max_open_jobs" : "20",
              "enabled" : "true"
            }
          },
          "name" : "W2uxKNc"
        },
        "path" : {
          "data" : [
            "/var/lib/elasticsearch"
          ],
          "logs" : "/var/log/elasticsearch",
          "home" : "/usr/share/elasticsearch"
        },
        "client" : {
          "type" : "node"
        },
        "http" : {
          "type" : "security4",
          "type.default" : "netty4"
        },
        "transport" : {
          "type" : "security4",
          "features" : {
            "x-pack" : "true"
          },
          "type.default" : "netty4"
        }
      },
      "os" : {
        "refresh_interval_in_millis" : 1000,
        "name" : "Linux",
        "pretty_name" : "Ubuntu 22.04.1 LTS",
        "arch" : "amd64",
        "version" : "5.15.0-46-generic",
        "available_processors" : 16,
        "allocated_processors" : 16
      },
      "process" : {
        "refresh_interval_in_millis" : 1000,
        "id" : 1041,
        "mlockall" : false
      },
      "jvm" : {
        "pid" : 1041,
        "version" : "11.0.16",
        "vm_name" : "OpenJDK 64-Bit Server VM",
        "vm_version" : "11.0.16+8-post-Ubuntu-0ubuntu122.04",
        "vm_vendor" : "Ubuntu",
        "start_time_in_millis" : 1661769755777,
        "mem" : {
          "heap_init_in_bytes" : 4294967296,
          "heap_max_in_bytes" : 4181590016,
          "non_heap_init_in_bytes" : 7667712,
          "non_heap_max_in_bytes" : 0,
          "direct_max_in_bytes" : 0
        },
        "gc_collectors" : [
          "ParNew",
          "ConcurrentMarkSweep"
        ],
        "memory_pools" : [
          "CodeHeap 'non-nmethods'",
          "Metaspace",
          "CodeHeap 'profiled nmethods'",
          "Compressed Class Space",
          "Par Eden Space",
          "Par Survivor Space",
          "CodeHeap 'non-profiled nmethods'",
          "CMS Old Gen"
        ],
        "using_compressed_ordinary_object_pointers" : "true",
        "input_arguments" : [
          "-Xms4g",
          "-Xmx4g",
          "-XX:+UseConcMarkSweepGC",
          "-XX:CMSInitiatingOccupancyFraction=75",
          "-XX:+UseCMSInitiatingOccupancyOnly",
          "-Des.networkaddress.cache.ttl=60",
          "-Des.networkaddress.cache.negative.ttl=10",
          "-XX:+AlwaysPreTouch",
          "-Xss1m",
          "-Djava.awt.headless=true",
          "-Dfile.encoding=UTF-8",
          "-Djna.nosys=true",
          "-XX:-OmitStackTraceInFastThrow",
          "-Dio.netty.noUnsafe=true",
          "-Dio.netty.noKeySetOptimization=true",
          "-Dio.netty.recycler.maxCapacityPerThread=0",
          "-Dlog4j.shutdownHookEnabled=false",
          "-Dlog4j2.disable.jmx=true",
          "-Dlog4j2.formatMsgNoLookups=true",
          "-Djava.io.tmpdir=/tmp/elasticsearch-14060835447651286248",
          "-XX:+HeapDumpOnOutOfMemoryError",
          "-XX:HeapDumpPath=/var/lib/elasticsearch",
          "-XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log",
          "-Djava.locale.providers=COMPAT",
          "-XX:UseAVX=2",
          "-Des.path.home=/usr/share/elasticsearch",
          "-Des.path.conf=/etc/elasticsearch",
          "-Des.distribution.flavor=default",
          "-Des.distribution.type=deb"
        ]
      },
      "thread_pool" : {
        "watcher" : {
          "type" : "fixed",
          "min" : 50,
          "max" : 50,
          "queue_size" : 1000
        },
        "force_merge" : {
          "type" : "fixed",
          "min" : 1,
          "max" : 1,
          "queue_size" : -1
        },
        "security-token-key" : {
          "type" : "fixed",
          "min" : 1,
          "max" : 1,
          "queue_size" : 1000
        },
        "ml_datafeed" : {
          "type" : "fixed",
          "min" : 20,
          "max" : 20,
          "queue_size" : 200
        },
        "fetch_shard_started" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 32,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "listener" : {
          "type" : "fixed",
          "min" : 8,
          "max" : 8,
          "queue_size" : -1
        },
        "ml_autodetect" : {
          "type" : "fixed",
          "min" : 80,
          "max" : 80,
          "queue_size" : 80
        },
        "index" : {
          "type" : "fixed",
          "min" : 16,
          "max" : 16,
          "queue_size" : 200
        },
        "refresh" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 8,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "generic" : {
          "type" : "scaling",
          "min" : 4,
          "max" : 128,
          "keep_alive" : "30s",
          "queue_size" : -1
        },
        "rollup_indexing" : {
          "type" : "fixed",
          "min" : 4,
          "max" : 4,
          "queue_size" : 4
        },
        "warmer" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 5,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "search" : {
          "type" : "fixed_auto_queue_size",
          "min" : 25,
          "max" : 25,
          "queue_size" : 1000
        },
        "ccr" : {
          "type" : "fixed",
          "min" : 32,
          "max" : 32,
          "queue_size" : 100
        },
        "flush" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 5,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "fetch_shard_store" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 32,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "management" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 5,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "ml_utility" : {
          "type" : "fixed",
          "min" : 80,
          "max" : 80,
          "queue_size" : 500
        },
        "get" : {
          "type" : "fixed",
          "min" : 16,
          "max" : 16,
          "queue_size" : 1000
        },
        "analyze" : {
          "type" : "fixed",
          "min" : 1,
          "max" : 1,
          "queue_size" : 16
        },
        "write" : {
          "type" : "fixed",
          "min" : 16,
          "max" : 16,
          "queue_size" : 200
        },
        "snapshot" : {
          "type" : "scaling",
          "min" : 1,
          "max" : 5,
          "keep_alive" : "5m",
          "queue_size" : -1
        },
        "search_throttled" : {
          "type" : "fixed_auto_queue_size",
          "min" : 1,
          "max" : 1,
          "queue_size" : 100
        }
      },
      "transport" : {
        "bound_address" : [
          "[::1]:9300",
          "127.0.0.1:9300"
        ],
        "publish_address" : "127.0.0.1:9300",
        "profiles" : { }
      },
      "http" : {
        "bound_address" : [
          "[::1]:9200",
          "127.0.0.1:9200"
        ],
        "publish_address" : "127.0.0.1:9200",
        "max_content_length_in_bytes" : 104857600
      },
      "plugins" : [ ],
      "modules" : [
        {
          "name" : "aggs-matrix-stats",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Adds aggregations whose input are a list of numeric fields and output includes a matrix.",
          "classname" : "org.elasticsearch.search.aggregations.matrix.MatrixAggregationPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "analysis-common",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Adds \"built in\" analyzers to Elasticsearch.",
          "classname" : "org.elasticsearch.analysis.common.CommonAnalysisPlugin",
          "extended_plugins" : [
            "lang-painless"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "ingest-common",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Module for ingest processors that do not require additional security permissions or have large dependencies and resources",
          "classname" : "org.elasticsearch.ingest.common.IngestCommonPlugin",
          "extended_plugins" : [
            "lang-painless"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "ingest-geoip",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Ingest processor that uses looksup geo data based on ip adresses using the Maxmind geo database",
          "classname" : "org.elasticsearch.ingest.geoip.IngestGeoIpPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "ingest-user-agent",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Ingest processor that extracts information from a user agent",
          "classname" : "org.elasticsearch.ingest.useragent.IngestUserAgentPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "lang-expression",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Lucene expressions integration for Elasticsearch",
          "classname" : "org.elasticsearch.script.expression.ExpressionPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "lang-mustache",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Mustache scripting integration for Elasticsearch",
          "classname" : "org.elasticsearch.script.mustache.MustachePlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "lang-painless",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "An easy, safe and fast scripting language for Elasticsearch",
          "classname" : "org.elasticsearch.painless.PainlessPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "mapper-extras",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Adds advanced field mappers",
          "classname" : "org.elasticsearch.index.mapper.MapperExtrasPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "parent-join",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "This module adds the support parent-child queries and aggregations",
          "classname" : "org.elasticsearch.join.ParentJoinPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "percolator",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Percolator module adds capability to index queries and query these queries by specifying documents",
          "classname" : "org.elasticsearch.percolator.PercolatorPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "rank-eval",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "The Rank Eval module adds APIs to evaluate ranking quality.",
          "classname" : "org.elasticsearch.index.rankeval.RankEvalPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "reindex",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "The Reindex module adds APIs to reindex from one index to another or update documents in place.",
          "classname" : "org.elasticsearch.index.reindex.ReindexPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "repository-url",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Module for URL repository",
          "classname" : "org.elasticsearch.plugin.repository.url.URLRepositoryPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "transport-netty4",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Netty 4 based transport implementation",
          "classname" : "org.elasticsearch.transport.Netty4Plugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "tribe",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Tribe module",
          "classname" : "org.elasticsearch.tribe.TribePlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-ccr",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - CCR",
          "classname" : "org.elasticsearch.xpack.ccr.Ccr",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-core",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Core",
          "classname" : "org.elasticsearch.xpack.core.XPackPlugin",
          "extended_plugins" : [ ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-deprecation",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Deprecation",
          "classname" : "org.elasticsearch.xpack.deprecation.Deprecation",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-graph",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Graph",
          "classname" : "org.elasticsearch.xpack.graph.Graph",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-ilm",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Index Lifecycle Management",
          "classname" : "org.elasticsearch.xpack.indexlifecycle.IndexLifecycle",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-logstash",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Logstash",
          "classname" : "org.elasticsearch.xpack.logstash.Logstash",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-ml",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Machine Learning",
          "classname" : "org.elasticsearch.xpack.ml.MachineLearning",
          "extended_plugins" : [
            "x-pack-core",
            "lang-painless"
          ],
          "has_native_controller" : true
        },
        {
          "name" : "x-pack-monitoring",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Monitoring",
          "classname" : "org.elasticsearch.xpack.monitoring.Monitoring",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-rollup",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Rollup",
          "classname" : "org.elasticsearch.xpack.rollup.Rollup",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-security",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Security",
          "classname" : "org.elasticsearch.xpack.security.Security",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-sql",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "The Elasticsearch plugin that powers SQL for Elasticsearch",
          "classname" : "org.elasticsearch.xpack.sql.plugin.SqlPlugin",
          "extended_plugins" : [
            "x-pack-core",
            "lang-painless"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-upgrade",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Upgrade",
          "classname" : "org.elasticsearch.xpack.upgrade.Upgrade",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        },
        {
          "name" : "x-pack-watcher",
          "version" : "6.8.23",
          "elasticsearch_version" : "6.8.23",
          "java_version" : "1.8",
          "description" : "Elasticsearch Expanded Pack Plugin - Watcher",
          "classname" : "org.elasticsearch.xpack.watcher.Watcher",
          "extended_plugins" : [
            "x-pack-core"
          ],
          "has_native_controller" : false
        }
      ],
      "ingest" : {
        "processors" : [
          {
            "type" : "append"
          },
          {
            "type" : "bytes"
          },
          {
            "type" : "convert"
          },
          {
            "type" : "date"
          },
          {
            "type" : "date_index_name"
          },
          {
            "type" : "dissect"
          },
          {
            "type" : "dot_expander"
          },
          {
            "type" : "drop"
          },
          {
            "type" : "fail"
          },
          {
            "type" : "foreach"
          },
          {
            "type" : "geoip"
          },
          {
            "type" : "grok"
          },
          {
            "type" : "gsub"
          },
          {
            "type" : "join"
          },
          {
            "type" : "json"
          },
          {
            "type" : "kv"
          },
          {
            "type" : "lowercase"
          },
          {
            "type" : "pipeline"
          },
          {
            "type" : "remove"
          },
          {
            "type" : "rename"
          },
          {
            "type" : "script"
          },
          {
            "type" : "set"
          },
          {
            "type" : "set_security_user"
          },
          {
            "type" : "sort"
          },
          {
            "type" : "split"
          },
          {
            "type" : "trim"
          },
          {
            "type" : "uppercase"
          },
          {
            "type" : "urldecode"
          },
          {
            "type" : "user_agent"
          }
        ]
      }
    }
  }
}

Tweaks

Elas­tic­search could use huge amount of RAM. But, I've test­ed it for thin in­stances it work even with on­ly 128m. The main con­fig­u­ra­tion files are lo­cat­ed in­to the di­rec­to­ry /​​​etc/​​​elasticsearch/​​​. You can tweak the amount of Ram in use by tweak­ing the rel­e­vant lines in the file jvm.options. Note Xms and Xmx val­ues must be equal.

sudo nano /etc/elasticsearch/jvm.options
# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space

#-Xms512m
#-Xmx512m
-Xms4g
-Xmx4g

Add restart al­ways di­rec­tive to the Elasticsearch's sys­temd unit.

sudo systemctl edit elasticsearch.service
[Service]
# SZS/MLT Tweak
Restart=always
RestartSec=3

To ap­ply the changes use the fol­low­ing com­mands.

sudo systemctl daemon-reload
sudo systemctl restart elasticsearch.service
systemctl status elasticsearch.service
systemctl cat elasticsearch.service

Me­di­aWi­ki Set­up

The main pur­pose of this guide is how to set­up Elas­tic­search to be used by MediaWiki's ex­ten­sion Cir­rusSearch, so in this sec­tion we will de­scribe how to do that. In ad­di­tion al­so the ex­ten­sion Ad­vanced­Search will be in­stalled and con­fig­ured. How to con­fig­ure ex­ten­sion Trans­late to use Elas­tic­search is de­cried in the MediaWiki's doc­u­men­ta­tion in the ar­ti­cle Trans­la­tion mem­o­ries.

In­stall the Ex­ten­sions

First of all you need to in­stall the ex­ten­sions with­in the MediaWiki's doc­u­ment root. In the fol­low­ing ex­am­ple is used the ap­proach Down­load from Git.

: ${IP:="/var/www/wiki.example.com"} # The DocumentRoot directory of the wiki
: ${OWNER:="www-data"}               # The user that owns the $IP directory
: ${BRANCH:="REL1_38"}               # The MediaWiki's branch in use
cd "$IP/extensions"
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/AdvancedSearch --branch ${BRANCH}
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/Elastica --branch ${BRANCH}
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/CirrusSearch --branch ${BRANCH}
sudo chown -R ${Owner}:${Owner} Elastica/ CirrusSearch/
for ext in Elastica CirrusSearch; do sudo -u ${Owner} composer install --no-dev; done

LocalSettings.php Con­fig­u­ra­tion

Open the con­fig­u­ra­tion file with your fa­vorite ed­i­tor and place the fol­low­ing lines at suit­able place (the end of the file is good place). In the ex­am­ple be­low is shown the cur­rent con­fig­u­ra­tion of this wi­ki. Af­ter the build­ing of the search in­dex (next sec­tion) Cir­rusSearch should work with­out the ad­vanced set­up.

sudo nano "$IP/LocalSettings.php"
<?php
## Extension:AdvancedSearch
wfLoadExtension( 'AdvancedSearch' );
$wgAdvancedSearchDeepcatEnabled = false;    // https://www.mediawiki.org/wiki/Topic:Uw036nwsilvb6w3t
$wgAdvancedSearchBetaFeature = false;       // (enable it by default) https://m.mediawiki.org/wiki/Topic:Upflskaswcvrunka
$wgAdvancedSearchHighlighting = true;       // https://www.mediawiki.org/wiki/Manual:Configuration_settings_(alphabetical)
$wgOpenSearchDescriptionLength = 2500;      // https://www.mediawiki.org/wiki/Manual:$wgOpenSearchDescriptionLength

## Extension:Elastica
wfLoadExtension( 'Elastica' );

## Extension:CirrusSearch
wfLoadExtension( 'CirrusSearch' );
$wgCirrusSearchRescoreProfile = 'classic_noboostlinks';
// $wgDisableSearchUpdate = true;
$wgSearchType = 'CirrusSearch';
$wgDebugLogGroups['CirrusSearch'] = "$IP/cache/CirrusSearch.log";
$wgCirrusSearchIndexBaseName = 'wiki_mlt_mlw';
// $wgCirrusSearchServers = [ '10.12.201.1' ];
// $wgCirrusExploreSimilarResults = true;

## Extension:CirrusSearch Advanced Setup
$wgCirrusSearchPhraseSuggestUseText = true;
$wgCirrusSearchCompletionSuggesterHardLimit = 200; // 50
$wgCirrusSearchFragmentSize = 200;
// $wgCirrusSearchFullTextQueryBuilderProfiles = 'perfield_builder';
// $wgCirrusSearchCompletionProfiles = 'normal';
$wgCirrusSearchNamespaceWeights = [
    "2" => 0.05,
    "4" => 0.3,
    "6" => 0.2,
    "8" => 0.05,
    "10" => 0.005,
    "12" => 0.2,
    "14" => 0.1
];  // https://www.mediawiki.org/wiki/Help:Namespaces#Localisation
$wgCirrusSearchWeights = [
    "title" => 20,
    "redirect" => 15,
    "category" => 8,
    "heading" => 5,
    "opening_text" => 3,
    "text" => 5,
    "auxiliary_text" => 15,
    "file_text" => 25
];

Build Search In­dex

За да за­поч­не ре­гу­ляр­но ин­дек­си­ра­не на съ­дър­жа­ни­е­то на уики­то, спря­мо кон­фи­гу­ра­ци­я­та, напра­ве­на в /var/­www/­*/­Local­Sett­ings.php и до­ку­мен­та­ци­я­та на mw:Extension­:­CirrusSearch тряб­ва да напра­ви пър­во­на­чал­на ин­дек­са­ция, да се из­пъл­нят за­да­чи­те, ко­и­то ще съз­да­де тя, да се ре­ге­не­ри­ра ин­дек­са на съ­дър­жа­ни­е­то и от­но­во да се из­праз­ни опаш­ка­та със за­да­чи­те. За цел­та мо­гат да се из­пол­з­ват скрип­то­ве­те за под­дръж­ка, опи­са­ни в сек­ци­я­та Me­di­aWi­ki.

mw-maintenance-elasticsearch-index.sh
mw-maintenance-runJobs.sh cli
mw-maintenance-rebuildAll.sh
mw-maintenance-runJobs.sh cli

В до­пъл­не­ние е раз­ра­бо­тен скрип­та elasticsearch​-watch​.sh, ка­то чрез crontab за­да­ча се пра­ви пе­ри­о­дич­на про­вер­ка и при не­об­хо­ди­мост рес­тар­ти­ра­не. Скрип­та из­пра­ща пис­мо до vectoria@altclavis.com, ако настъ­пи съ­би­тие.

Ad­di­tion­al Set­up

Ac­cess Elas­tic­search via SSH Tun­nel

Us­ing such ap­proach is suit­able on­ly for test pur­pose, here is a man­u­al how to set-up:

Elas­tic­search watch scripts

sudo crontab -e
# ElasticSearch Watch
*/5 * * * * /usr/local/bin/elasticsearch-watch.sh

Ref­er­ences