MediaWiki Extension CirrusSearch and Elasticsearch Setup: Difference between revisions
Line 870: | Line 870: | ||
== MediaWiki Setup == | == MediaWiki Setup == | ||
The main purpose of this guide is how to setup Elasticsearch to be used by MediaWiki's extension [[mw:Extension:CirrusSearch|'''CirrusSearch''']], so in this section we will describe how to do that. In addition also the extension [[mw:Extension:AdvancedSearch|AdvancedSearch]] will be installed and configured. How to configure extension Translate to use Elasticsearch is decried in the MediaWiki's documentation in the article [[mw:Help:Extension:Translate/Translation memories#ElasticSearch%20backend|Translation memories]]. | The main purpose of this guide is how to setup Elasticsearch to be used by MediaWiki's extension [[mw:Extension:CirrusSearch|'''CirrusSearch''']], so in this section we will describe how to do that. In addition also the extension [[mw:Extension:AdvancedSearch|AdvancedSearch]] will be installed and configured. | ||
If you have installed the extension [[mw:Extension:PdfHandler|PdfHandler]] (or some other file handling extension) CirrusSearch will show results from the files content - in the configuration below is shown how to boost these results. How to configure extension Translate to use Elasticsearch is decried in the MediaWiki's documentation in the article [[mw:Help:Extension:Translate/Translation memories#ElasticSearch%20backend|Translation memories]]. | |||
=== Install the Extensions === | === Install the Extensions === | ||
Line 890: | Line 892: | ||
=== LocalSettings.php Configuration === | === LocalSettings.php Configuration === | ||
Open the configuration file with your favorite editor and place the following lines at suitable place (the end of the file is good place). In the example below is shown the current configuration of this wiki. After the building of the search index (next section) CirrusSearch should work without the advanced setup.<syntaxhighlight lang="shell" line="1"> | Open the configuration file with your favorite editor and place the following lines at suitable place (the end of the file is good place). In the example below is shown the current configuration of this wiki. After the building of the search index (next section) CirrusSearch should work without the advanced setup. More options are described in the [[mw:Extension:CirrusSearch#Configuration|Extension:CirrusSearch]] page, also some undocumented options could be found within its <code>[https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/CirrusSearch/+/HEAD/extension.json extension.json]</code> file.<syntaxhighlight lang="shell" line="1"> | ||
sudo nano "$IP/LocalSettings.php" | sudo nano "$IP/LocalSettings.php" | ||
</syntaxhighlight><syntaxhighlight lang="php" line="1" start="750"> | </syntaxhighlight><syntaxhighlight lang="php" line="1" start="750"> | ||
## Extension:AdvancedSearch | ## Extension:AdvancedSearch | ||
wfLoadExtension( 'AdvancedSearch' ); | wfLoadExtension( 'AdvancedSearch' ); | ||
Line 906: | Line 907: | ||
## Extension:CirrusSearch | ## Extension:CirrusSearch | ||
wfLoadExtension( 'CirrusSearch' ); | wfLoadExtension( 'CirrusSearch' ); | ||
// $wgDisableSearchUpdate = true; | // $wgDisableSearchUpdate = true; | ||
$wgSearchType = 'CirrusSearch'; | $wgSearchType = 'CirrusSearch'; | ||
$wgDebugLogGroups['CirrusSearch'] = "$IP/cache/CirrusSearch.log"; | $wgDebugLogGroups['CirrusSearch'] = "$IP/cache/CirrusSearch.log"; | ||
$wgCirrusSearchIndexBaseName = ' | // $wgCirrusSearchIndexBaseName = 'wiki_db_name'; // https://www.mediawiki.org/wiki/Extension:CirrusSearch#Configuration | ||
// $wgCirrusSearchServers = [ '10. | // $wgCirrusSearchServers = [ '10.120.201.1' ]; // The address of the Elasticsearch serer if it is not available at 'localhost' | ||
// | |||
## Extension:CirrusSearch Advanced Setup | ## Extension:CirrusSearch Advanced Setup | ||
$wgCirrusSearchRescoreProfile = 'classic_noboostlinks'; | |||
// $wgCirrusSearchFullTextQueryBuilderProfiles = 'perfield_builder'; | |||
// $wgCirrusSearchCompletionProfiles = 'normal'; | |||
$wgCirrusSearchPhraseSuggestUseText = true; | $wgCirrusSearchPhraseSuggestUseText = true; | ||
$wgCirrusSearchCompletionSuggesterHardLimit = 200; // 50 | $wgCirrusSearchCompletionSuggesterHardLimit = 200; // 50 | ||
$wgCirrusSearchFragmentSize = 200; | $wgCirrusSearchFragmentSize = 200; | ||
// $ | // $wgCirrusExploreSimilarResults = true; | ||
// | |||
// Give much weight to the "file_text" in order to show | |||
// results from the PDFs content. This requires PdfHandler | |||
$wgCirrusSearchWeights = [ | $wgCirrusSearchWeights = [ | ||
"title" => 20, | "title" => 20, | ||
Line 938: | Line 933: | ||
"auxiliary_text" => 15, | "auxiliary_text" => 15, | ||
"file_text" => 25 | "file_text" => 25 | ||
]; | |||
// https://www.mediawiki.org/wiki/Help:Namespaces#Localisation | |||
$wgCirrusSearchNamespaceWeights = [ | |||
"2" => 0.05, | |||
"4" => 0.3, | |||
"6" => 0.2, | |||
"8" => 0.05, | |||
"10" => 0.005, | |||
"12" => 0.2, | |||
"14" => 0.1 | |||
]; | ]; | ||
</syntaxhighlight> | </syntaxhighlight> | ||
=== Build Search Index === | === Build Search Index === | ||
How to build and update the CirrusSearch/Elasticsearc index is well described in the documents [https://gerrit.wikimedia.org/g/mediawiki/extensions/CirrusSearch/%2B/HEAD/README README] and [https://gerrit.wikimedia.org/g/mediawiki/extensions/CirrusSearch/%2B/HEAD/UPGRADE UPGRADE] which comes with the extension. Here the important steps related to the index building for a first time are extracted and converted to a script.<syntaxhighlight lang="shell"> | |||
mlw-maintenance-cirrussearch-elasticsearch-create-index.sh | |||
</syntaxhighlight><syntaxhighlight lang="bash" line="1"> | |||
#!/bin/bash | |||
# @author Spas Z. Spasov <spas.z.spasov@metalevel.tech> | |||
# @copyright 2022 Spas Z. Spasov | |||
# @license https://www.gnu.org/licenses/gpl-3.0.html GNU General Public License, version 3 (or later) | |||
# | |||
# @name /usr/local/bin/mlw-maintenance-cirrusSearch-elasticsearch-create-index-${IP##*/}.sh | |||
# @desc Create elastic search index for an MediaWiki instance | |||
# | |||
# @source https://phabricator.wikimedia.org/source/extension-cirrussearch/browse/master/README | |||
: ${IP:="/var/www/wiki.metalevel.tech"} # The DocumentRoot directory of the wiki | |||
: ${OWNER:="www-data"} # The user that owns the $IP directory | |||
# STEP 0 | |||
printf -- '\n**\nDisable Cirrus Search for %s ------------\n*\n\n' "${IP##*/}" | |||
sudo -u "$OWNER" sed -i 's#^$wgSearchType#// $wgSearchType#' $IP/LocalSettings.php | |||
sudo -u "$OWNER" sed -i 's#^// $wgDisableSearchUpdate#$wgDisableSearchUpdate#' $IP/LocalSettings.php | |||
echo -e "\n\n**\n$IP/LocalSettings.php\n*\n" | |||
sudo -u "$OWNER" grep '$wgSearchType\|$wgDisableSearchUpdate = true' $IP/LocalSettings.php | |||
echo | |||
sleep 5 | |||
# STEP 1 | |||
printf -- '\n**\nGenerate ElasticSearch Index for %s ------------\n*\n\n' "${IP##*/}" | |||
sudo -u "$OWNER" /usr/bin/php $IP/extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --startOver --conf $IP/LocalSettings.php | |||
echo | |||
sleep 5 | |||
# STEP 2 | |||
sudo -u "$OWNER" sed -i 's#^$wgDisableSearchUpdate#// $wgDisableSearchUpdate#' $IP/LocalSettings.php | |||
sudo -u "$OWNER" grep '$wgSearchType\|$wgDisableSearchUpdate = true' $IP/LocalSettings.php | |||
echo | |||
sleep 5 | |||
printf -- '\n**\n Bootstrap the Search Index for %s ------------\n*\n\n' "${IP##*/}" | |||
sudo -u "$OWNER" /usr/bin/php $IP/extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipLinks --indexOnSkip --conf $IP/LocalSettings.php | |||
sudo -u "$OWNER" /usr/bin/php $IP/extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipParse --conf $IP/LocalSettings.php | |||
# STEP 3 | |||
sleep 5 | |||
printf -- '\n**\nEnable Cirrus Search for %s ------------\n*\n\n' "${IP##*/}" | |||
sudo -u "$OWNER" sed -i 's#^// $wgSearchType#$wgSearchType#' $IP/LocalSettings.php | |||
echo -e '\n\n**\n$IP/LocalSettings.php\n*\n' | |||
sudo -u "$OWNER" grep '$wgSearchType\|$wgDisableSearchUpdate = true' $IP/LocalSettings.php | |||
echo | |||
# Step 4 | |||
sleep 5 | |||
printf -- '\n**\nUpdate Cirrus Search Suggestions for %s ------------\n*\n\n' "${IP##*/}" | |||
sudo -u "$OWNER" /usr/bin/php $IP/extensions/CirrusSearch/maintenance/UpdateSuggesterIndex.php --conf $IP/LocalSettings.php | |||
</syntaxhighlight> | |||
За да започне регулярно индексиране на съдържанието на уикито, спрямо конфигурацията, направена в <code>/var/­www/­*/­Local­Sett­ings.php</code> и документацията на [[Mw:Extension:CirrusSearch|mw:Extension­:­CirrusSearch]] трябва да направи първоначална индексация, да се изпълнят задачите, които ще създаде тя, да се регенерира индекса на съдържанието и отново да се изпразни опашката със задачите. За целта могат да се използват скриптовете за поддръжка, описани в секцията MediaWiki.<syntaxhighlight lang="shell" line="1"> | За да започне регулярно индексиране на съдържанието на уикито, спрямо конфигурацията, направена в <code>/var/­www/­*/­Local­Sett­ings.php</code> и документацията на [[Mw:Extension:CirrusSearch|mw:Extension­:­CirrusSearch]] трябва да направи първоначална индексация, да се изпълнят задачите, които ще създаде тя, да се регенерира индекса на съдържанието и отново да се изпразни опашката със задачите. За целта могат да се използват скриптовете за поддръжка, описани в секцията MediaWiki.<syntaxhighlight lang="shell" line="1"> | ||
Line 948: | Line 1,009: | ||
mw-maintenance-rebuildAll.sh | mw-maintenance-rebuildAll.sh | ||
mw-maintenance-runJobs.sh cli | mw-maintenance-runJobs.sh cli | ||
</syntaxhighlight> | </syntaxhighlight> | ||
== Additional Setup == | == Additional Setup == | ||
Line 957: | Line 1,017: | ||
* [[SSH Persistent Tunnel and SSHFS Mount via "systemd" units]]. | * [[SSH Persistent Tunnel and SSHFS Mount via "systemd" units]]. | ||
=== Elasticsearch | === Elasticsearch Watch Scripts === | ||
<syntaxhighlight lang="shell" line="1"> | В допълнение е разработен скрипта <code>elasticsearch-watch.sh</code>, като чрез <code>crontab</code> задача се прави периодична проверка и при необходимост рестартиране. Скрипта изпраща писмо до <code>vectoria@altclavis.com</code>, ако настъпи събитие.<syntaxhighlight lang="shell" line="1"> | ||
sudo crontab -e | sudo crontab -e | ||
</syntaxhighlight><syntaxhighlight lang="bash"> | </syntaxhighlight><syntaxhighlight lang="bash"> |
Revision as of 14:12, 30 August 2022
This is a short manual how to set-up Elasticsearch to be used with the MediaWiki's extension CirrusSearch which communicate to the service by the extension Elastica. You should choice an appropriate Elasticsearch version depending on your MediaWiki version. Currently I'm using MediaWiki 1.38 and it is recommended to use Elasticsearch 6.8.23+ with it. This version runs well over openjdk-11
which is the default Java version on Ubuntu Server 22.04.
Elasticsearch and the extension Elastica are required by some other MediaWiki extensions as extension Translate where it is used as translation memory. It is also used by the NextCoud's application Full text search and more…
Java Setup
On Ubuntu Server the default jdk
and jre
packages can be installed by the following command.
sudo apt install -y apt-transport-https default-jdk default-jre
To check and switch the current version of Java and Javac you can use the following commands.
sudo update-alternatives --config java
There are 2 choices for the alternative java (providing /usr/bin/java).
Selection Path Priority Status
------------------------------------------------------------
0 /usr/lib/jvm/java-11-openjdk-amd64/bin/java 1111 auto mode
* 1 /usr/lib/jvm/java-11-openjdk-amd64/bin/java 1111 manual mode
2 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java 1081 manual mode
Press <enter> to keep the current choice[*], or type selection number: 1
sudo update-alternatives --config javac
There are 2 choices for the alternative javac (providing /usr/bin/javac).
Selection Path Priority Status
------------------------------------------------------------
0 /usr/lib/jvm/java-11-openjdk-amd64/bin/javac 1111 auto mode
* 1 /usr/lib/jvm/java-11-openjdk-amd64/bin/javac 1111 manual mode
2 /usr/lib/jvm/java-8-openjdk-amd64/bin/javac 1081 manual mode
Press <enter> to keep the current choice[*], or type selection number: 1
If you are using Elasticsearch 5.x it requires openjdk‑8
which can be installed by the following commands. After the installation use the above commands to switch the version in use.
sudo apt install openjdk-8-jre-headless
sudo apt install openjdk-8-jdk-headless
After switching the version of Java you need to restart the Elasticsearch service if it is already installed.
sudo systemctl restart elasticsearch.service
curl 'http://127.0.0.1:9200' # do a test
Elasticsearch Setup
Installation
There is a couple of ways how to Installing Elasticsearch – via Docker, via Apt repository, via .deb or .rpm packages, etc. I prefer to manually download and install it via .deb package. Is I said before for MediaWiki 1.38 we need version 6.8.23+.
cd ~/Downloads
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.8.23.deb
sudo apt install ./elasticsearch-6.8.23.deb
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.16.deb
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.5.4.deb
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.0.1-amd64.deb
After installing the package the Elasticsearch service must be enabled and started.
sudo systemctl enable --now elasticsearch.service # enable and start the service
systemctl status elasticsearch.service # check the status of the service
systemctl cat elasticsearch.service # check the current service's configuration
Check
You can check does the service work properly by the following approach.
curl 'http://127.0.0.1:9200'
{
"name" : "W2uxKNc",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "JwkNoPi_THuiCA123-HKMg",
"version" : {
"number" : "6.8.23",
"build_flavor" : "default",
"build_type" : "deb",
"build_hash" : "4f67856",
"build_date" : "2022-01-06T21:30:50.087716Z",
"build_snapshot" : false,
"lucene_version" : "7.7.3",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
More detailed information can be obtained by the next command.
curl -XGET 'http://localhost:9200/_nodes?pretty'
{
"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"cluster_name" : "elasticsearch",
"nodes" : {
"W2uxKNc9SQqZSVN4RIZmNg" : {
"name" : "W2uxKNc",
"transport_address" : "127.0.0.1:9300",
"host" : "127.0.0.1",
"ip" : "127.0.0.1",
"version" : "6.8.23",
"build_flavor" : "default",
"build_type" : "deb",
"build_hash" : "4f67856",
"total_indexing_buffer" : 418159001,
"roles" : [
"master",
"data",
"ingest"
],
"attributes" : {
"ml.machine_memory" : "24456527872",
"xpack.installed" : "true",
"ml.max_open_jobs" : "20",
"ml.enabled" : "true"
},
"settings" : {
"pidfile" : "/var/run/elasticsearch/elasticsearch.pid",
"cluster" : {
"name" : "elasticsearch"
},
"node" : {
"attr" : {
"xpack" : {
"installed" : "true"
},
"ml" : {
"machine_memory" : "24456527872",
"max_open_jobs" : "20",
"enabled" : "true"
}
},
"name" : "W2uxKNc"
},
"path" : {
"data" : [
"/var/lib/elasticsearch"
],
"logs" : "/var/log/elasticsearch",
"home" : "/usr/share/elasticsearch"
},
"client" : {
"type" : "node"
},
"http" : {
"type" : "security4",
"type.default" : "netty4"
},
"transport" : {
"type" : "security4",
"features" : {
"x-pack" : "true"
},
"type.default" : "netty4"
}
},
"os" : {
"refresh_interval_in_millis" : 1000,
"name" : "Linux",
"pretty_name" : "Ubuntu 22.04.1 LTS",
"arch" : "amd64",
"version" : "5.15.0-46-generic",
"available_processors" : 16,
"allocated_processors" : 16
},
"process" : {
"refresh_interval_in_millis" : 1000,
"id" : 1041,
"mlockall" : false
},
"jvm" : {
"pid" : 1041,
"version" : "11.0.16",
"vm_name" : "OpenJDK 64-Bit Server VM",
"vm_version" : "11.0.16+8-post-Ubuntu-0ubuntu122.04",
"vm_vendor" : "Ubuntu",
"start_time_in_millis" : 1661769755777,
"mem" : {
"heap_init_in_bytes" : 4294967296,
"heap_max_in_bytes" : 4181590016,
"non_heap_init_in_bytes" : 7667712,
"non_heap_max_in_bytes" : 0,
"direct_max_in_bytes" : 0
},
"gc_collectors" : [
"ParNew",
"ConcurrentMarkSweep"
],
"memory_pools" : [
"CodeHeap 'non-nmethods'",
"Metaspace",
"CodeHeap 'profiled nmethods'",
"Compressed Class Space",
"Par Eden Space",
"Par Survivor Space",
"CodeHeap 'non-profiled nmethods'",
"CMS Old Gen"
],
"using_compressed_ordinary_object_pointers" : "true",
"input_arguments" : [
"-Xms4g",
"-Xmx4g",
"-XX:+UseConcMarkSweepGC",
"-XX:CMSInitiatingOccupancyFraction=75",
"-XX:+UseCMSInitiatingOccupancyOnly",
"-Des.networkaddress.cache.ttl=60",
"-Des.networkaddress.cache.negative.ttl=10",
"-XX:+AlwaysPreTouch",
"-Xss1m",
"-Djava.awt.headless=true",
"-Dfile.encoding=UTF-8",
"-Djna.nosys=true",
"-XX:-OmitStackTraceInFastThrow",
"-Dio.netty.noUnsafe=true",
"-Dio.netty.noKeySetOptimization=true",
"-Dio.netty.recycler.maxCapacityPerThread=0",
"-Dlog4j.shutdownHookEnabled=false",
"-Dlog4j2.disable.jmx=true",
"-Dlog4j2.formatMsgNoLookups=true",
"-Djava.io.tmpdir=/tmp/elasticsearch-14060835447651286248",
"-XX:+HeapDumpOnOutOfMemoryError",
"-XX:HeapDumpPath=/var/lib/elasticsearch",
"-XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log",
"-Djava.locale.providers=COMPAT",
"-XX:UseAVX=2",
"-Des.path.home=/usr/share/elasticsearch",
"-Des.path.conf=/etc/elasticsearch",
"-Des.distribution.flavor=default",
"-Des.distribution.type=deb"
]
},
"thread_pool" : {
"watcher" : {
"type" : "fixed",
"min" : 50,
"max" : 50,
"queue_size" : 1000
},
"force_merge" : {
"type" : "fixed",
"min" : 1,
"max" : 1,
"queue_size" : -1
},
"security-token-key" : {
"type" : "fixed",
"min" : 1,
"max" : 1,
"queue_size" : 1000
},
"ml_datafeed" : {
"type" : "fixed",
"min" : 20,
"max" : 20,
"queue_size" : 200
},
"fetch_shard_started" : {
"type" : "scaling",
"min" : 1,
"max" : 32,
"keep_alive" : "5m",
"queue_size" : -1
},
"listener" : {
"type" : "fixed",
"min" : 8,
"max" : 8,
"queue_size" : -1
},
"ml_autodetect" : {
"type" : "fixed",
"min" : 80,
"max" : 80,
"queue_size" : 80
},
"index" : {
"type" : "fixed",
"min" : 16,
"max" : 16,
"queue_size" : 200
},
"refresh" : {
"type" : "scaling",
"min" : 1,
"max" : 8,
"keep_alive" : "5m",
"queue_size" : -1
},
"generic" : {
"type" : "scaling",
"min" : 4,
"max" : 128,
"keep_alive" : "30s",
"queue_size" : -1
},
"rollup_indexing" : {
"type" : "fixed",
"min" : 4,
"max" : 4,
"queue_size" : 4
},
"warmer" : {
"type" : "scaling",
"min" : 1,
"max" : 5,
"keep_alive" : "5m",
"queue_size" : -1
},
"search" : {
"type" : "fixed_auto_queue_size",
"min" : 25,
"max" : 25,
"queue_size" : 1000
},
"ccr" : {
"type" : "fixed",
"min" : 32,
"max" : 32,
"queue_size" : 100
},
"flush" : {
"type" : "scaling",
"min" : 1,
"max" : 5,
"keep_alive" : "5m",
"queue_size" : -1
},
"fetch_shard_store" : {
"type" : "scaling",
"min" : 1,
"max" : 32,
"keep_alive" : "5m",
"queue_size" : -1
},
"management" : {
"type" : "scaling",
"min" : 1,
"max" : 5,
"keep_alive" : "5m",
"queue_size" : -1
},
"ml_utility" : {
"type" : "fixed",
"min" : 80,
"max" : 80,
"queue_size" : 500
},
"get" : {
"type" : "fixed",
"min" : 16,
"max" : 16,
"queue_size" : 1000
},
"analyze" : {
"type" : "fixed",
"min" : 1,
"max" : 1,
"queue_size" : 16
},
"write" : {
"type" : "fixed",
"min" : 16,
"max" : 16,
"queue_size" : 200
},
"snapshot" : {
"type" : "scaling",
"min" : 1,
"max" : 5,
"keep_alive" : "5m",
"queue_size" : -1
},
"search_throttled" : {
"type" : "fixed_auto_queue_size",
"min" : 1,
"max" : 1,
"queue_size" : 100
}
},
"transport" : {
"bound_address" : [
"[::1]:9300",
"127.0.0.1:9300"
],
"publish_address" : "127.0.0.1:9300",
"profiles" : { }
},
"http" : {
"bound_address" : [
"[::1]:9200",
"127.0.0.1:9200"
],
"publish_address" : "127.0.0.1:9200",
"max_content_length_in_bytes" : 104857600
},
"plugins" : [ ],
"modules" : [
{
"name" : "aggs-matrix-stats",
"version" : "6.8.23",
"elasticsearch_version" : "6.8.23",
"java_version" : "1.8",
"description" : "Adds aggregations whose input are a list of numeric fields and output includes a matrix.",
"classname" : "org.elasticsearch.search.aggregations.matrix.MatrixAggregationPlugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "analysis-common",
"version" : "6.8.23",
"elasticsearch_version" : "6.8.23",
"java_version" : "1.8",
"description" : "Adds \"built in\" analyzers to Elasticsearch.",
"classname" : "org.elasticsearch.analysis.common.CommonAnalysisPlugin",
"extended_plugins" : [
"lang-painless"
],
"has_native_controller" : false
},
{
"name" : "ingest-common",
"version" : "6.8.23",
"elasticsearch_version" : "6.8.23",
"java_version" : "1.8",
"description" : "Module for ingest processors that do not require additional security permissions or have large dependencies and resources",
"classname" : "org.elasticsearch.ingest.common.IngestCommonPlugin",
"extended_plugins" : [
"lang-painless"
],
"has_native_controller" : false
},
{
"name" : "ingest-geoip",
"version" : "6.8.23",
"elasticsearch_version" : "6.8.23",
"java_version" : "1.8",
"description" : "Ingest processor that uses looksup geo data based on ip adresses using the Maxmind geo database",
"classname" : "org.elasticsearch.ingest.geoip.IngestGeoIpPlugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "ingest-user-agent",
"version" : "6.8.23",
"elasticsearch_version" : "6.8.23",
"java_version" : "1.8",
"description" : "Ingest processor that extracts information from a user agent",
"classname" : "org.elasticsearch.ingest.useragent.IngestUserAgentPlugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "lang-expression",
"version" : "6.8.23",
"elasticsearch_version" : "6.8.23",
"java_version" : "1.8",
"description" : "Lucene expressions integration for Elasticsearch",
"classname" : "org.elasticsearch.script.expression.ExpressionPlugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "lang-mustache",
"version" : "6.8.23",
"elasticsearch_version" : "6.8.23",
"java_version" : "1.8",
"description" : "Mustache scripting integration for Elasticsearch",
"classname" : "org.elasticsearch.script.mustache.MustachePlugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "lang-painless",
"version" : "6.8.23",
"elasticsearch_version" : "6.8.23",
"java_version" : "1.8",
"description" : "An easy, safe and fast scripting language for Elasticsearch",
"classname" : "org.elasticsearch.painless.PainlessPlugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "mapper-extras",
"version" : "6.8.23",
"elasticsearch_version" : "6.8.23",
"java_version" : "1.8",
"description" : "Adds advanced field mappers",
"classname" : "org.elasticsearch.index.mapper.MapperExtrasPlugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "parent-join",
"version" : "6.8.23",
"elasticsearch_version" : "6.8.23",
"java_version" : "1.8",
"description" : "This module adds the support parent-child queries and aggregations",
"classname" : "org.elasticsearch.join.ParentJoinPlugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "percolator",
"version" : "6.8.23",
"elasticsearch_version" : "6.8.23",
"java_version" : "1.8",
"description" : "Percolator module adds capability to index queries and query these queries by specifying documents",
"classname" : "org.elasticsearch.percolator.PercolatorPlugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "rank-eval",
"version" : "6.8.23",
"elasticsearch_version" : "6.8.23",
"java_version" : "1.8",
"description" : "The Rank Eval module adds APIs to evaluate ranking quality.",
"classname" : "org.elasticsearch.index.rankeval.RankEvalPlugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "reindex",
"version" : "6.8.23",
"elasticsearch_version" : "6.8.23",
"java_version" : "1.8",
"description" : "The Reindex module adds APIs to reindex from one index to another or update documents in place.",
"classname" : "org.elasticsearch.index.reindex.ReindexPlugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "repository-url",
"version" : "6.8.23",
"elasticsearch_version" : "6.8.23",
"java_version" : "1.8",
"description" : "Module for URL repository",
"classname" : "org.elasticsearch.plugin.repository.url.URLRepositoryPlugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "transport-netty4",
"version" : "6.8.23",
"elasticsearch_version" : "6.8.23",
"java_version" : "1.8",
"description" : "Netty 4 based transport implementation",
"classname" : "org.elasticsearch.transport.Netty4Plugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "tribe",
"version" : "6.8.23",
"elasticsearch_version" : "6.8.23",
"java_version" : "1.8",
"description" : "Tribe module",
"classname" : "org.elasticsearch.tribe.TribePlugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "x-pack-ccr",
"version" : "6.8.23",
"elasticsearch_version" : "6.8.23",
"java_version" : "1.8",
"description" : "Elasticsearch Expanded Pack Plugin - CCR",
"classname" : "org.elasticsearch.xpack.ccr.Ccr",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "x-pack-core",
"version" : "6.8.23",
"elasticsearch_version" : "6.8.23",
"java_version" : "1.8",
"description" : "Elasticsearch Expanded Pack Plugin - Core",
"classname" : "org.elasticsearch.xpack.core.XPackPlugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "x-pack-deprecation",
"version" : "6.8.23",
"elasticsearch_version" : "6.8.23",
"java_version" : "1.8",
"description" : "Elasticsearch Expanded Pack Plugin - Deprecation",
"classname" : "org.elasticsearch.xpack.deprecation.Deprecation",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "x-pack-graph",
"version" : "6.8.23",
"elasticsearch_version" : "6.8.23",
"java_version" : "1.8",
"description" : "Elasticsearch Expanded Pack Plugin - Graph",
"classname" : "org.elasticsearch.xpack.graph.Graph",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "x-pack-ilm",
"version" : "6.8.23",
"elasticsearch_version" : "6.8.23",
"java_version" : "1.8",
"description" : "Elasticsearch Expanded Pack Plugin - Index Lifecycle Management",
"classname" : "org.elasticsearch.xpack.indexlifecycle.IndexLifecycle",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "x-pack-logstash",
"version" : "6.8.23",
"elasticsearch_version" : "6.8.23",
"java_version" : "1.8",
"description" : "Elasticsearch Expanded Pack Plugin - Logstash",
"classname" : "org.elasticsearch.xpack.logstash.Logstash",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "x-pack-ml",
"version" : "6.8.23",
"elasticsearch_version" : "6.8.23",
"java_version" : "1.8",
"description" : "Elasticsearch Expanded Pack Plugin - Machine Learning",
"classname" : "org.elasticsearch.xpack.ml.MachineLearning",
"extended_plugins" : [
"x-pack-core",
"lang-painless"
],
"has_native_controller" : true
},
{
"name" : "x-pack-monitoring",
"version" : "6.8.23",
"elasticsearch_version" : "6.8.23",
"java_version" : "1.8",
"description" : "Elasticsearch Expanded Pack Plugin - Monitoring",
"classname" : "org.elasticsearch.xpack.monitoring.Monitoring",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "x-pack-rollup",
"version" : "6.8.23",
"elasticsearch_version" : "6.8.23",
"java_version" : "1.8",
"description" : "Elasticsearch Expanded Pack Plugin - Rollup",
"classname" : "org.elasticsearch.xpack.rollup.Rollup",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "x-pack-security",
"version" : "6.8.23",
"elasticsearch_version" : "6.8.23",
"java_version" : "1.8",
"description" : "Elasticsearch Expanded Pack Plugin - Security",
"classname" : "org.elasticsearch.xpack.security.Security",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "x-pack-sql",
"version" : "6.8.23",
"elasticsearch_version" : "6.8.23",
"java_version" : "1.8",
"description" : "The Elasticsearch plugin that powers SQL for Elasticsearch",
"classname" : "org.elasticsearch.xpack.sql.plugin.SqlPlugin",
"extended_plugins" : [
"x-pack-core",
"lang-painless"
],
"has_native_controller" : false
},
{
"name" : "x-pack-upgrade",
"version" : "6.8.23",
"elasticsearch_version" : "6.8.23",
"java_version" : "1.8",
"description" : "Elasticsearch Expanded Pack Plugin - Upgrade",
"classname" : "org.elasticsearch.xpack.upgrade.Upgrade",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "x-pack-watcher",
"version" : "6.8.23",
"elasticsearch_version" : "6.8.23",
"java_version" : "1.8",
"description" : "Elasticsearch Expanded Pack Plugin - Watcher",
"classname" : "org.elasticsearch.xpack.watcher.Watcher",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
}
],
"ingest" : {
"processors" : [
{
"type" : "append"
},
{
"type" : "bytes"
},
{
"type" : "convert"
},
{
"type" : "date"
},
{
"type" : "date_index_name"
},
{
"type" : "dissect"
},
{
"type" : "dot_expander"
},
{
"type" : "drop"
},
{
"type" : "fail"
},
{
"type" : "foreach"
},
{
"type" : "geoip"
},
{
"type" : "grok"
},
{
"type" : "gsub"
},
{
"type" : "join"
},
{
"type" : "json"
},
{
"type" : "kv"
},
{
"type" : "lowercase"
},
{
"type" : "pipeline"
},
{
"type" : "remove"
},
{
"type" : "rename"
},
{
"type" : "script"
},
{
"type" : "set"
},
{
"type" : "set_security_user"
},
{
"type" : "sort"
},
{
"type" : "split"
},
{
"type" : "trim"
},
{
"type" : "uppercase"
},
{
"type" : "urldecode"
},
{
"type" : "user_agent"
}
]
}
}
}
}
Tweaks
Elasticsearch could use huge amount of RAM. But, I've tested it for thin instances it work even with only 128m
. The main configuration files are located into the directory /etc/elasticsearch/
. You can tweak the amount of Ram in use by tweaking the relevant lines in the file jvm.options
. Note Xms
and Xmx
values must be equal.
sudo nano /etc/elasticsearch/jvm.options
# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space
#-Xms512m
#-Xmx512m
-Xms4g
-Xmx4g
Add restart always directive to the Elasticsearch's systemd unit.
sudo systemctl edit elasticsearch.service
[Service]
# SZS/MLT Tweak
Restart=always
RestartSec=3
To apply the changes use the following commands.
sudo systemctl daemon-reload
sudo systemctl restart elasticsearch.service
systemctl status elasticsearch.service
systemctl cat elasticsearch.service
MediaWiki Setup
The main purpose of this guide is how to setup Elasticsearch to be used by MediaWiki's extension CirrusSearch, so in this section we will describe how to do that. In addition also the extension AdvancedSearch will be installed and configured.
If you have installed the extension PdfHandler (or some other file handling extension) CirrusSearch will show results from the files content – in the configuration below is shown how to boost these results. How to configure extension Translate to use Elasticsearch is decried in the MediaWiki's documentation in the article Translation memories.
Install the Extensions
First of all you need to install the extensions within the MediaWiki's document root. In the following example is used the approach Download from Git.
: ${IP:="/var/www/wiki.example.com"} # The DocumentRoot directory of the wiki
: ${OWNER:="www-data"} # The user that owns the $IP directory
: ${BRANCH:="REL1_38"} # The MediaWiki's branch in use
cd "$IP/extensions"
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/AdvancedSearch --branch ${BRANCH}
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/Elastica --branch ${BRANCH}
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/CirrusSearch --branch ${BRANCH}
sudo chown -R ${Owner}:${Owner} Elastica/ CirrusSearch/
for ext in Elastica CirrusSearch; do sudo -u ${Owner} composer install --no-dev; done
LocalSettings.php Configuration
Open the configuration file with your favorite editor and place the following lines at suitable place (the end of the file is good place). In the example below is shown the current configuration of this wiki. After the building of the search index (next section) CirrusSearch should work without the advanced setup. More options are described in the Extension:CirrusSearch page, also some undocumented options could be found within its extension.json
file.
sudo nano "$IP/LocalSettings.php"
## Extension:AdvancedSearch
wfLoadExtension( 'AdvancedSearch' );
$wgAdvancedSearchDeepcatEnabled = false; // https://www.mediawiki.org/wiki/Topic:Uw036nwsilvb6w3t
$wgAdvancedSearchBetaFeature = false; // (enable it by default) https://m.mediawiki.org/wiki/Topic:Upflskaswcvrunka
$wgAdvancedSearchHighlighting = true; // https://www.mediawiki.org/wiki/Manual:Configuration_settings_(alphabetical)
$wgOpenSearchDescriptionLength = 2500; // https://www.mediawiki.org/wiki/Manual:$wgOpenSearchDescriptionLength
## Extension:Elastica
wfLoadExtension( 'Elastica' );
## Extension:CirrusSearch
wfLoadExtension( 'CirrusSearch' );
// $wgDisableSearchUpdate = true;
$wgSearchType = 'CirrusSearch';
$wgDebugLogGroups['CirrusSearch'] = "$IP/cache/CirrusSearch.log";
// $wgCirrusSearchIndexBaseName = 'wiki_db_name'; // https://www.mediawiki.org/wiki/Extension:CirrusSearch#Configuration
// $wgCirrusSearchServers = [ '10.120.201.1' ]; // The address of the Elasticsearch serer if it is not available at 'localhost'
## Extension:CirrusSearch Advanced Setup
$wgCirrusSearchRescoreProfile = 'classic_noboostlinks';
// $wgCirrusSearchFullTextQueryBuilderProfiles = 'perfield_builder';
// $wgCirrusSearchCompletionProfiles = 'normal';
$wgCirrusSearchPhraseSuggestUseText = true;
$wgCirrusSearchCompletionSuggesterHardLimit = 200; // 50
$wgCirrusSearchFragmentSize = 200;
// $wgCirrusExploreSimilarResults = true;
// Give much weight to the "file_text" in order to show
// results from the PDFs content. This requires PdfHandler
$wgCirrusSearchWeights = [
"title" => 20,
"redirect" => 15,
"category" => 8,
"heading" => 5,
"opening_text" => 3,
"text" => 5,
"auxiliary_text" => 15,
"file_text" => 25
];
// https://www.mediawiki.org/wiki/Help:Namespaces#Localisation
$wgCirrusSearchNamespaceWeights = [
"2" => 0.05,
"4" => 0.3,
"6" => 0.2,
"8" => 0.05,
"10" => 0.005,
"12" => 0.2,
"14" => 0.1
];
Build Search Index
How to build and update the CirrusSearch/Elasticsearc index is well described in the documents README and UPGRADE which comes with the extension. Here the important steps related to the index building for a first time are extracted and converted to a script.
mlw-maintenance-cirrussearch-elasticsearch-create-index.sh
#!/bin/bash
# @author Spas Z. Spasov <spas.z.spasov@metalevel.tech>
# @copyright 2022 Spas Z. Spasov
# @license https://www.gnu.org/licenses/gpl-3.0.html GNU General Public License, version 3 (or later)
#
# @name /usr/local/bin/mlw-maintenance-cirrusSearch-elasticsearch-create-index-${IP##*/}.sh
# @desc Create elastic search index for an MediaWiki instance
#
# @source https://phabricator.wikimedia.org/source/extension-cirrussearch/browse/master/README
: ${IP:="/var/www/wiki.metalevel.tech"} # The DocumentRoot directory of the wiki
: ${OWNER:="www-data"} # The user that owns the $IP directory
# STEP 0
printf -- '\n**\nDisable Cirrus Search for %s ------------\n*\n\n' "${IP##*/}"
sudo -u "$OWNER" sed -i 's#^$wgSearchType#// $wgSearchType#' $IP/LocalSettings.php
sudo -u "$OWNER" sed -i 's#^// $wgDisableSearchUpdate#$wgDisableSearchUpdate#' $IP/LocalSettings.php
echo -e "\n\n**\n$IP/LocalSettings.php\n*\n"
sudo -u "$OWNER" grep '$wgSearchType\|$wgDisableSearchUpdate = true' $IP/LocalSettings.php
echo
sleep 5
# STEP 1
printf -- '\n**\nGenerate ElasticSearch Index for %s ------------\n*\n\n' "${IP##*/}"
sudo -u "$OWNER" /usr/bin/php $IP/extensions/CirrusSearch/maintenance/UpdateSearchIndexConfig.php --startOver --conf $IP/LocalSettings.php
echo
sleep 5
# STEP 2
sudo -u "$OWNER" sed -i 's#^$wgDisableSearchUpdate#// $wgDisableSearchUpdate#' $IP/LocalSettings.php
sudo -u "$OWNER" grep '$wgSearchType\|$wgDisableSearchUpdate = true' $IP/LocalSettings.php
echo
sleep 5
printf -- '\n**\n Bootstrap the Search Index for %s ------------\n*\n\n' "${IP##*/}"
sudo -u "$OWNER" /usr/bin/php $IP/extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipLinks --indexOnSkip --conf $IP/LocalSettings.php
sudo -u "$OWNER" /usr/bin/php $IP/extensions/CirrusSearch/maintenance/ForceSearchIndex.php --skipParse --conf $IP/LocalSettings.php
# STEP 3
sleep 5
printf -- '\n**\nEnable Cirrus Search for %s ------------\n*\n\n' "${IP##*/}"
sudo -u "$OWNER" sed -i 's#^// $wgSearchType#$wgSearchType#' $IP/LocalSettings.php
echo -e '\n\n**\n$IP/LocalSettings.php\n*\n'
sudo -u "$OWNER" grep '$wgSearchType\|$wgDisableSearchUpdate = true' $IP/LocalSettings.php
echo
# Step 4
sleep 5
printf -- '\n**\nUpdate Cirrus Search Suggestions for %s ------------\n*\n\n' "${IP##*/}"
sudo -u "$OWNER" /usr/bin/php $IP/extensions/CirrusSearch/maintenance/UpdateSuggesterIndex.php --conf $IP/LocalSettings.php
За да започне регулярно индексиране на съдържанието на уикито, спрямо конфигурацията, направена в /var/www/*/LocalSettings.php
и документацията на mw:Extension:CirrusSearch трябва да направи първоначална индексация, да се изпълнят задачите, които ще създаде тя, да се регенерира индекса на съдържанието и отново да се изпразни опашката със задачите. За целта могат да се използват скриптовете за поддръжка, описани в секцията MediaWiki.
mw-maintenance-elasticsearch-index.sh
mw-maintenance-runJobs.sh cli
mw-maintenance-rebuildAll.sh
mw-maintenance-runJobs.sh cli
Additional Setup
Access Elasticsearch via SSH Tunnel
Using such approach is suitable only for test purpose, here is a manual how to set-up:
Elasticsearch Watch Scripts
В допълнение е разработен скрипта elasticsearch-watch.sh
, като чрез crontab
задача се прави периодична проверка и при необходимост рестартиране. Скрипта изпраща писмо до vectoria@altclavis.com
, ако настъпи събитие.
sudo crontab -e
# ElasticSearch Watch
*/5 * * * * /usr/local/bin/elasticsearch-watch.sh
References
- BitLaunch: How to install Elasticsearch on Ubuntu 20.04 LTS
- Computing for Geeks: Install Elasticsearch 6.x on Ubuntu 18.04 LTS
- Media Wiki: Extension:CirrusSearch
- Phabricator: Extension:CirrusSearch
- Media Wiki: CirrusSearch Talk – Java version compatibility
- Mincong's blog: GC in Elasticsearch – Basic information about garbage collection (GC) in Elasticsearch, JVM options, GC logging
- Foojay.io: Handling JDK & GC Options Dynamically in Elasticsearch
- Elasticsearch Documentation: Important Elasticsearch configuration
- Elasticsearch Documentation: GC logging