MediaWiki Extension CirrusSearch and Elasticsearch Setup
This is a short manual how to set-up Elasticsearch to be used with the MediaWiki's extension CirrusSearch which communicate to the service by the extension Elastica. You should choice an appropriate Elasticsearch version depending on your MediaWiki version. Currently I'm using MediaWiki 1.38 and it is recommended to use Elasticsearch 6.8.23+ with it. This version runs well over openjdk-11
which is the default Java version on Ubuntu Server 22.04.
Elasticsearch and the extension Elastica are required by some other MediaWiki extensions as extension Translate where it is used as translation memory. It is also used by the NextCoud's application Full text search and more…
Java and Javac
On Ubuntu Server the default jdk
and jre
packages can be installed by the following command.
sudo apt install -y apt-transport-https default-jdk default-jre
To check and switch the current version of Java and Javac you can use the following commands.
sudo update-alternatives --config java
There are 2 choices for the alternative java (providing /usr/bin/java).
Selection Path Priority Status
------------------------------------------------------------
0 /usr/lib/jvm/java-11-openjdk-amd64/bin/java 1111 auto mode
* 1 /usr/lib/jvm/java-11-openjdk-amd64/bin/java 1111 manual mode
2 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java 1081 manual mode
Press <enter> to keep the current choice[*], or type selection number: 1
sudo update-alternatives --config javac
There are 2 choices for the alternative javac (providing /usr/bin/javac).
Selection Path Priority Status
------------------------------------------------------------
0 /usr/lib/jvm/java-11-openjdk-amd64/bin/javac 1111 auto mode
* 1 /usr/lib/jvm/java-11-openjdk-amd64/bin/javac 1111 manual mode
2 /usr/lib/jvm/java-8-openjdk-amd64/bin/javac 1081 manual mode
Press <enter> to keep the current choice[*], or type selection number: 1
If you are using Elasticsearch 5.x it requires openjdk‑8
which can be installed by the following commands. After the installation use the above commands to switch the version in use.
sudo apt install openjdk-8-jre-headless
sudo apt install openjdk-8-jdk-headless
After switching the version of Java you need to restart the Elasticsearch service if it is already installed.
sudo systemctl restart elasticsearch.service
curl 'http://127.0.0.1:9200' # do a test
Elasticsearch
Installation
There is a couple of ways how to Installing Elasticsearch – via Docker, via Apt repository, via .deb or .rpm packages, etc. I prefer to manually download and install it via .deb package. Is I said before for MediaWiki 1.38 we need version 6.8.23+.
cd ~/Downloads
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.8.23.deb
sudo apt install ./elasticsearch-6.8.23.deb
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.16.deb
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.5.4.deb
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.0.1-amd64.deb
After installing the package the Elasticsearch service must be enabled and started.
sudo systemctl enable --now elasticsearch.service
sudo systemctl status elasticsearch.service
Check
You can check does the service work properly by the following approach.
curl 'http://127.0.0.1:9200'
{
"name" : "W2uxKNc",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "JwkNoPi_THuiCA123-HKMg",
"version" : {
"number" : "6.8.23",
"build_flavor" : "default",
"build_type" : "deb",
"build_hash" : "4f67856",
"build_date" : "2022-01-06T21:30:50.087716Z",
"build_snapshot" : false,
"lucene_version" : "7.7.3",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
More detailed information can be obtained by the next command.
curl -XGET 'http://localhost:9200/_nodes?pretty'
Tweaks
Намаляване на разрешената памет. Дори и тази ми се вижда много в bg.trivictoria.org e 128m
. Ако има много едновременни заявки може да надхвърли наличната памет и да се счупи. От друга срана ако е много малко пак се чупи.
sudo nano /etc/elasticsearch/jvm.options
#-Xms1g
#-Xmx1g
-Xms512m
-Xmx512m
Добавяне на директиви за автоматично рестартиране в system.d unit‑а.
sudo cp /lib/systemd/system/elasticsearch.service ~/Downloads/elasticsearch.service.default
sudo nano /lib/systemd/system/elasticsearch.service
[Service]
# В края на секцията
Restart=always
RestartSec=3
Прилагане на промените.
sudo systemctl daemon-reload
sudo systemctl restart elasticsearch.service
sudo systemctl status elasticsearch.service
За да започне регулярно индексиране на съдържанието на уикито, спрямо конфигурацията, направена в /var/www/*/LocalSettings.php
и документацията на mw:Extension:CirrusSearch трябва да направи първоначална индексация, да се изпълнят задачите, които ще създаде тя, да се регенерира индекса на съдържанието и отново да се изпразни опашката със задачите. За целта могат да се използват скриптовете за поддръжка, описани в секцията MediaWiki.
mw-maintenance-elasticsearch-index.sh
mw-maintenance-runJobs.sh cli
mw-maintenance-rebuildAll.sh
mw-maintenance-runJobs.sh cli
В допълнение е разработен скрипта elasticsearch-watch.sh
, като чрез crontab
задача се прави периодична проверка и при необходимост рестартиране. Скрипта изпраща писмо до vectoria@altclavis.com
, ако настъпи събитие.
sudo crontab -e
# ElasticSearch Watch
*/5 * * * * /usr/local/bin/elasticsearch-watch.sh
References
- BitLaunch: How to install Elasticsearch on Ubuntu 20.04 LTS
- Computing for Geeks: Install Elasticsearch 6.x on Ubuntu 18.04 LTS
- Media Wiki: Extension:CirrusSearch
- Phabricator: Extension:CirrusSearch
- Media Wiki: CirrusSearch Talk – Java version compatibility
- Mincong's blog: GC in Elasticsearch – Basic information about garbage collection (GC) in Elasticsearch, JVM options, GC logging
- Foojay.io: Handling JDK & GC Options Dynamically in Elasticsearch
- Elasticsearch Documentation: Important Elasticsearch configuration
- Elasticsearch Documentation: GC logging
Access Elasticsearch via SSH