MediaWiki Extension CirrusSearch and Elasticsearch Setup: Difference between revisions
Line 1: | Line 1: | ||
<noinclude><!--[[Category:DevOps_and_SRE|?]]-->{{ContentArticleHeader/MediaWiki|toc=off}}{{ContentArticleHeader/DevOps_and_SRE|toc=off}}{{ContentArticleHeader/Linux_Server}}</noinclude> | <noinclude><!--[[Category:DevOps_and_SRE|?]]-->{{ContentArticleHeader/MediaWiki|toc=off}}{{ContentArticleHeader/DevOps_and_SRE|toc=off}}{{ContentArticleHeader/Linux_Server}}</noinclude> | ||
This is a short manual how to set-up [https://www.elastic.co/guide/en/elasticsearch/reference/6.8/important-settings.html Elasticsearch] to be used with the MediaWiki's extension [[mw:Extension:CirrusSearch|CirrusSearch]]. You should choice an appropriate Elasticsearch version [[mw:Extension:CirrusSearch#Elasticsearch|depending]] on your MediaWiki version. Currently I'm using MediaWiki 1.38 and it is recommended to use Elasticsearch 6.8.23+ with it. This version runs well over <code>openjdk-11</code> which is the default Java version on Ubuntu Server 22.04. | This is a short manual how to set-up [https://www.elastic.co/guide/en/elasticsearch/reference/6.8/important-settings.html Elasticsearch] to be used with the MediaWiki's extension [[mw:Extension:CirrusSearch|CirrusSearch]] which communicate to the service by the extension [[mw:Extension:Elastica|Elastica]]. You should choice an appropriate Elasticsearch version [[mw:Extension:CirrusSearch#Elasticsearch|depending]] on your MediaWiki version. Currently I'm using MediaWiki 1.38 and it is recommended to use Elasticsearch 6.8.23+ with it. This version runs well over <code>openjdk-11</code> which is the default Java version on Ubuntu Server 22.04. | ||
Elasticsearch and the extension Elastica are required by some other MediaWiki extensions as extension [[mw:Help:Extension:Translate/Translation memories#ElasticSearch%20backend|Translate]] where it is used as translation memory. It is also used by the NextCoud's application [https://github.com/nextcloud/fulltextsearch/wiki Full text search] and more... | |||
== Setup Java and Javac == | == Setup Java and Javac == | ||
Line 70: | Line 72: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
{{collapse/end}} | {{collapse/end}} | ||
After installing the package the Elasticsearch service must be enabled and started.<syntaxhighlight lang="shell" line="1"> | |||
sudo systemctl enable --now elasticsearch.service | |||
sudo systemctl enable | |||
sudo systemctl status elasticsearch.service | sudo systemctl status elasticsearch.service | ||
</syntaxhighlight> | </syntaxhighlight>You can check does the service work properly by the following approach.<syntaxhighlight lang="shell" line="1"> | ||
curl 'http://127.0.0.1:9200' | curl 'http://127.0.0.1:9200' | ||
</syntaxhighlight><syntaxhighlight lang="json"> | </syntaxhighlight><syntaxhighlight lang="json"> | ||
Line 124: | Line 95: | ||
"tagline" : "You Know, for Search" | "tagline" : "You Know, for Search" | ||
} | } | ||
</syntaxhighlight>За да започне регулярно индексиране на съдържанието на уикито, спрямо конфигурацията, направена в <code>/var/­www/­*/­Local­Sett­ings.php</code> и документацията на [[Mw:Extension:CirrusSearch|mw:Extension­:­CirrusSearch]] трябва да направи първоначална индексация, да се изпълнят задачите, които ще създаде тя, да се регенерира индекса на съдържанието и отново да се изпразни опашката със задачите. За целта могат да се използват скриптовете за поддръжка, описани в секцията MediaWiki.<syntaxhighlight lang="shell" line="1"> | </syntaxhighlight> | ||
Намаляване на разрешената памет. Дори и тази ми се вижда много в [https://bg.trivictoria.org bg.trivictoria.org] e <code>128m</code>. Ако има много едновременни заявки може да надхвърли наличната памет и да се счупи. От друга срана ако е много малко пак се чупи.<syntaxhighlight lang="shell" line="1"> | |||
sudo nano /etc/elasticsearch/jvm.options | |||
</syntaxhighlight><syntaxhighlight lang="bash"> | |||
#-Xms1g | |||
#-Xmx1g | |||
-Xms512m | |||
-Xmx512m | |||
</syntaxhighlight>Добавяне на директиви за автоматично рестартиране в system.d unit-а.<syntaxhighlight lang="shell" line="1"> | |||
sudo cp /lib/systemd/system/elasticsearch.service ~/Downloads/elasticsearch.service.default | |||
sudo nano /lib/systemd/system/elasticsearch.service | |||
</syntaxhighlight><syntaxhighlight lang="bash"> | |||
[Service] | |||
# В края на секцията | |||
Restart=always | |||
RestartSec=3 | |||
</syntaxhighlight>Прилагане на промените.<syntaxhighlight lang="shell" line="1"> | |||
sudo systemctl daemon-reload | |||
sudo systemctl restart elasticsearch.service | |||
sudo systemctl status elasticsearch.service | |||
</syntaxhighlight> | |||
За да започне регулярно индексиране на съдържанието на уикито, спрямо конфигурацията, направена в <code>/var/­www/­*/­Local­Sett­ings.php</code> и документацията на [[Mw:Extension:CirrusSearch|mw:Extension­:­CirrusSearch]] трябва да направи първоначална индексация, да се изпълнят задачите, които ще създаде тя, да се регенерира индекса на съдържанието и отново да се изпразни опашката със задачите. За целта могат да се използват скриптовете за поддръжка, описани в секцията MediaWiki.<syntaxhighlight lang="shell" line="1"> | |||
mw-maintenance-elasticsearch-index.sh | mw-maintenance-elasticsearch-index.sh | ||
mw-maintenance-runJobs.sh cli | mw-maintenance-runJobs.sh cli |
Revision as of 11:30, 30 August 2022
This is a short manual how to set-up Elasticsearch to be used with the MediaWiki's extension CirrusSearch which communicate to the service by the extension Elastica. You should choice an appropriate Elasticsearch version depending on your MediaWiki version. Currently I'm using MediaWiki 1.38 and it is recommended to use Elasticsearch 6.8.23+ with it. This version runs well over openjdk-11
which is the default Java version on Ubuntu Server 22.04.
Elasticsearch and the extension Elastica are required by some other MediaWiki extensions as extension Translate where it is used as translation memory. It is also used by the NextCoud's application Full text search and more…
Setup Java and Javac
On Ubuntu Server the default jdk
and jre
packages can be installed by the following command.
sudo apt install -y apt-transport-https default-jdk default-jre
To check and switch the current version of Java and Javac you can use the following commands.
sudo update-alternatives --config java
There are 2 choices for the alternative java (providing /usr/bin/java).
Selection Path Priority Status
------------------------------------------------------------
0 /usr/lib/jvm/java-11-openjdk-amd64/bin/java 1111 auto mode
* 1 /usr/lib/jvm/java-11-openjdk-amd64/bin/java 1111 manual mode
2 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java 1081 manual mode
Press <enter> to keep the current choice[*], or type selection number: 1
sudo update-alternatives --config javac
There are 2 choices for the alternative javac (providing /usr/bin/javac).
Selection Path Priority Status
------------------------------------------------------------
0 /usr/lib/jvm/java-11-openjdk-amd64/bin/javac 1111 auto mode
* 1 /usr/lib/jvm/java-11-openjdk-amd64/bin/javac 1111 manual mode
2 /usr/lib/jvm/java-8-openjdk-amd64/bin/javac 1081 manual mode
Press <enter> to keep the current choice[*], or type selection number: 1
If you are using Elasticsearch 5.x it requires openjdk‑8
which can be installed by the following commands. After the installation use the above commands to switch the version in use.
sudo apt install openjdk-8-jre-headless
sudo apt install openjdk-8-jdk-headless
After switching the version of Java you need to restart the Elasticsearch service if it is already installed.
sudo systemctl restart elasticsearch.service
curl 'http://127.0.0.1:9200' # do a test
Elasticsearch
There is a couple of ways how to Installing Elasticsearch – via Docker, via Apt repository, via .deb or .rpm packages, etc. I prefer to manually download and install it via .deb package. Is I said before for MediaWiki 1.38 we need version 6.8.23+.
cd ~/Downloads
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.8.23.deb
sudo apt install ./elasticsearch-6.8.23.deb
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.16.deb
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.5.4.deb
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.0.1-amd64.deb
After installing the package the Elasticsearch service must be enabled and started.
sudo systemctl enable --now elasticsearch.service
sudo systemctl status elasticsearch.service
You can check does the service work properly by the following approach.
curl 'http://127.0.0.1:9200'
{
"name" : "HFrziWt",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "5qMVv8CHT3q2vv1sd8hLOw",
"version" : {
"number" : "6.5.4",
"build_flavor" : "default",
"build_type" : "deb",
"build_hash" : "d2ef93d",
"build_date" : "2018-12-17T21:17:40.758843Z",
"build_snapshot" : false,
"lucene_version" : "7.5.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
Намаляване на разрешената памет. Дори и тази ми се вижда много в bg.trivictoria.org e 128m
. Ако има много едновременни заявки може да надхвърли наличната памет и да се счупи. От друга срана ако е много малко пак се чупи.
sudo nano /etc/elasticsearch/jvm.options
#-Xms1g
#-Xmx1g
-Xms512m
-Xmx512m
Добавяне на директиви за автоматично рестартиране в system.d unit‑а.
sudo cp /lib/systemd/system/elasticsearch.service ~/Downloads/elasticsearch.service.default
sudo nano /lib/systemd/system/elasticsearch.service
[Service]
# В края на секцията
Restart=always
RestartSec=3
Прилагане на промените.
sudo systemctl daemon-reload
sudo systemctl restart elasticsearch.service
sudo systemctl status elasticsearch.service
За да започне регулярно индексиране на съдържанието на уикито, спрямо конфигурацията, направена в /var/www/*/LocalSettings.php
и документацията на mw:Extension:CirrusSearch трябва да направи първоначална индексация, да се изпълнят задачите, които ще създаде тя, да се регенерира индекса на съдържанието и отново да се изпразни опашката със задачите. За целта могат да се използват скриптовете за поддръжка, описани в секцията MediaWiki.
mw-maintenance-elasticsearch-index.sh
mw-maintenance-runJobs.sh cli
mw-maintenance-rebuildAll.sh
mw-maintenance-runJobs.sh cli
В допълнение е разработен скрипта elasticsearch-watch.sh
, като чрез crontab
задача се прави периодична проверка и при необходимост рестартиране. Скрипта изпраща писмо до vectoria@altclavis.com
, ако настъпи събитие.
sudo crontab -e
# ElasticSearch Watch
*/5 * * * * /usr/local/bin/elasticsearch-watch.sh
References
- BitLaunch: How to install Elasticsearch on Ubuntu 20.04 LTS
- Computing for Geeks: Install Elasticsearch 6.x on Ubuntu 18.04 LTS
- Media Wiki: Extension:CirrusSearch
- Phabricator: Extension:CirrusSearch
- Media Wiki: CirrusSearch Talk – Java version compatibility
- Mincong's blog: GC in Elasticsearch – Basic information about garbage collection (GC) in Elasticsearch, JVM options, GC logging
- Foojay.io: Handling JDK & GC Options Dynamically in Elasticsearch
- Elasticsearch Documentation: Important Elasticsearch configuration
- Elasticsearch Documentation: GC logging
Access Elasticsearch via SSH
-REVIEW-
ElasticSearch
Required by MW:Extension:CirrusSearch, some other mw:extensions and some extensions of NextCloud.
~/Downloads/elastic-search/
sido apt install ./elasticsearch-5.6.16.deb
Добавяне на функция за автоматичен рестарт за elasticsearch.service
, тъй като по някой път се „чупи“:
sudo nano /lib/systemd/system/elasticsearch.service
[Service]
...
Restart=always
RestartSec=3
Java
Downgrade required by MW:Extension:CirrusSearch and ElasticSearch 5.6.16.