MediaWiki Extension CirrusSearch and Elasticsearch Setup: Difference between revisions
m Стадий: 6 [Фаза:Утвърждаване, Статус:Утвърден]; Категория:MediaWiki |
m →Update |
||
Line 1,787: | Line 1,787: | ||
<syntaxhighlight lang="shell" line="1"> | <syntaxhighlight lang="shell" line="1"> | ||
cd "$IP/extensions/CirrusSearch" | cd "$IP/extensions/CirrusSearch" | ||
sudo -u | sudo -u $OWNER php maintenance/UpdateSearchIndexConfig.php --reindexAndRemoveOk --indexIdentifier now | ||
sudo -u | sudo -u $OWNER php maintenance/Metastore.php --upgrade | ||
</syntaxhighlight>To monitor the filesystem changes during the update you can use a command like the following.<syntaxhighlight lang="shell" line="1" class="multi-line-cmd mlw-shell-gray"> | </syntaxhighlight>To monitor the filesystem changes during the update you can use a command like the following.<syntaxhighlight lang="shell" line="1" class="multi-line-cmd mlw-shell-gray"> | ||
sudo watch "du -hs /var/lib/mysql/wiki_id && \ | sudo watch "du -hs /var/lib/mysql/wiki_id && \ |
Latest revision as of 20:22, 5 March 2023
This is a short manual how to set-up Elasticsearch to be used with the MediaWiki's extension CirrusSearch which communicate to the service by the extension Elastica. You should choice an appropriate Elasticsearch version depending on your MediaWiki version. Currently I'm using MediaWiki 1.39 and it is recommended to use Elasticsearch 7.10.2 with it. This version runs well over openjdk-11
which is the default Java version on Ubuntu Server 22.04.
Elasticsearch and the extension Elastica are required by some other MediaWiki extensions as extension Translate where it is used as translation memory. It is also used by the NextCoud's application Full text search and more…
See also MediaWiki Job Queue.
Java Setup
On Ubuntu Server the default jdk
and jre
packages can be installed by the following command.
sudo apt install -y apt-transport-https default-jdk default-jre
To check and switch the current version of Java and Javac you can use the following commands.
sudo update-alternatives --config java
There are 2 choices for the alternative java (providing /usr/bin/java).
Selection Path Priority Status
------------------------------------------------------------
0 /usr/lib/jvm/java-11-openjdk-amd64/bin/java 1111 auto mode
* 1 /usr/lib/jvm/java-11-openjdk-amd64/bin/java 1111 manual mode
2 /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java 1081 manual mode
Press <enter> to keep the current choice[*], or type selection number: 1
sudo update-alternatives --config javac
There are 2 choices for the alternative javac (providing /usr/bin/javac).
Selection Path Priority Status
------------------------------------------------------------
0 /usr/lib/jvm/java-11-openjdk-amd64/bin/javac 1111 auto mode
* 1 /usr/lib/jvm/java-11-openjdk-amd64/bin/javac 1111 manual mode
2 /usr/lib/jvm/java-8-openjdk-amd64/bin/javac 1081 manual mode
Press <enter> to keep the current choice[*], or type selection number: 1
If you are using Elasticsearch 5.x it requires openjdk‑8
which can be installed by the following commands. After the installation use the above commands to switch the version in use.
sudo apt install openjdk-8-jre-headless
sudo apt install openjdk-8-jdk-headless
After switching the version of Java you need to restart the Elasticsearch service if it is already installed.
sudo systemctl restart elasticsearch.service
curl 'http://127.0.0.1:9200' # do a test
Elasticsearch Setup
Installation
There is a couple of ways how to Installing Elasticsearch – via Docker, via Apt repository, via .deb or .rpm packages, etc. I prefer to manually download and install it via .deb package. As we said before for MediaWiki 1.39+ we need Elasticsearch version 7.10.2.
cd ~/Downloads
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.10.2-amd64.deb
sudo apt install ./elasticsearch-7.10.2-amd64.deb
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.6.16.deb
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.5.4.deb
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.8.23.deb
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.0.1-amd64.deb
After installing the package the Elasticsearch service must be enabled and started.
sudo systemctl enable --now elasticsearch.service # enable and start the service
systemctl status elasticsearch.service # check the status of the service
systemctl cat elasticsearch.service # check the current service's configuration
Check
You can check does the service work properly by the following approach.
curl 'http://127.0.0.1:9200'
{
"name" : "metalevel.tech",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "znG-mCHAQU6L3oVR9UIthg",
"version" : {
"number" : "7.10.2",
"build_flavor" : "default",
"build_type" : "deb",
"build_hash" : "747e1cc71def077253878a59143c1f785afa92b9",
"build_date" : "2021-01-13T00:42:12.435326Z",
"build_snapshot" : false,
"lucene_version" : "8.7.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
More detailed information can be obtained by the next command.
curl -XGET 'http://localhost:9200/_nodes?pretty'
{
"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"cluster_name" : "elasticsearch",
"nodes" : {
"BNfFzNWMTF20Xd5nlcwt6w" : {
"name" : "metalevel.tech",
"transport_address" : "127.0.0.1:9300",
"host" : "127.0.0.1",
"ip" : "127.0.0.1",
"version" : "7.10.2",
"build_flavor" : "default",
"build_type" : "deb",
"build_hash" : "747e1cc71def077253878a59143c1f785afa92b9",
"total_indexing_buffer" : 214748364,
"roles" : [
"data",
"data_cold",
"data_content",
"data_hot",
"data_warm",
"ingest",
"master",
"ml",
"remote_cluster_client",
"transform"
],
"attributes" : {
"ml.machine_memory" : "25112887296",
"xpack.installed" : "true",
"transform.node" : "true",
"ml.max_open_jobs" : "20"
},
"settings" : {
"client" : {
"type" : "node"
},
"cluster" : {
"name" : "elasticsearch",
"election" : {
"strategy" : "supports_voting_only"
}
},
"http" : {
"type" : "security4",
"type.default" : "netty4"
},
"node" : {
"attr" : {
"transform" : {
"node" : "true"
},
"xpack" : {
"installed" : "true"
},
"ml" : {
"machine_memory" : "25112887296",
"max_open_jobs" : "20"
}
},
"name" : "metalevel.tech",
"pidfile" : "/var/run/elasticsearch/elasticsearch.pid"
},
"path" : {
"data" : [
"/var/lib/elasticsearch"
],
"logs" : "/var/log/elasticsearch",
"home" : "/usr/share/elasticsearch"
},
"transport" : {
"type" : "security4",
"features" : {
"x-pack" : "true"
},
"type.default" : "netty4"
}
},
"os" : {
"refresh_interval_in_millis" : 1000,
"name" : "Linux",
"pretty_name" : "Ubuntu 22.04.2 LTS",
"arch" : "amd64",
"version" : "5.15.0-67-generic",
"available_processors" : 16,
"allocated_processors" : 16
},
"process" : {
"refresh_interval_in_millis" : 1000,
"id" : 422075,
"mlockall" : false
},
"jvm" : {
"pid" : 422075,
"version" : "15.0.1",
"vm_name" : "OpenJDK 64-Bit Server VM",
"vm_version" : "15.0.1+9",
"vm_vendor" : "AdoptOpenJDK",
"bundled_jdk" : true,
"using_bundled_jdk" : true,
"start_time_in_millis" : 1677873247146,
"mem" : {
"heap_init_in_bytes" : 2147483648,
"heap_max_in_bytes" : 2147483648,
"non_heap_init_in_bytes" : 7667712,
"non_heap_max_in_bytes" : 0,
"direct_max_in_bytes" : 0
},
"gc_collectors" : [
"G1 Young Generation",
"G1 Old Generation"
],
"memory_pools" : [
"CodeHeap 'non-nmethods'",
"Metaspace",
"CodeHeap 'profiled nmethods'",
"Compressed Class Space",
"G1 Eden Space",
"G1 Old Gen",
"G1 Survivor Space",
"CodeHeap 'non-profiled nmethods'"
],
"using_compressed_ordinary_object_pointers" : "true",
"input_arguments" : [
"-Xshare:auto",
"-Des.networkaddress.cache.ttl=60",
"-Des.networkaddress.cache.negative.ttl=10",
"-XX:+AlwaysPreTouch",
"-Xss1m",
"-Djava.awt.headless=true",
"-Dfile.encoding=UTF-8",
"-Djna.nosys=true",
"-XX:-OmitStackTraceInFastThrow",
"-XX:+ShowCodeDetailsInExceptionMessages",
"-Dio.netty.noUnsafe=true",
"-Dio.netty.noKeySetOptimization=true",
"-Dio.netty.recycler.maxCapacityPerThread=0",
"-Dio.netty.allocator.numDirectArenas=0",
"-Dlog4j.shutdownHookEnabled=false",
"-Dlog4j2.disable.jmx=true",
"-Djava.locale.providers=SPI,COMPAT",
"-Xms2g",
"-Xmx2g",
"-XX:+UseG1GC",
"-XX:G1ReservePercent=25",
"-XX:InitiatingHeapOccupancyPercent=30",
"-Des.networkaddress.cache.ttl=60",
"-Des.networkaddress.cache.negative.ttl=10",
"-XX:+AlwaysPreTouch",
"-Xss1m",
"-Djava.awt.headless=true",
"-Dfile.encoding=UTF-8",
"-Djna.nosys=true",
"-XX:-OmitStackTraceInFastThrow",
"-XX:+ShowCodeDetailsInExceptionMessages",
"-Dio.netty.noUnsafe=true",
"-Dio.netty.noKeySetOptimization=true",
"-Dio.netty.recycler.maxCapacityPerThread=0",
"-Dlog4j.shutdownHookEnabled=false",
"-Dlog4j2.disable.jmx=true",
"-Dlog4j2.formatMsgNoLookups=true",
"-Djava.io.tmpdir=/tmp/elasticsearch-5908758710646640467",
"-XX:+HeapDumpOnOutOfMemoryError",
"-XX:HeapDumpPath=/var/lib/elasticsearch",
"-XX:ErrorFile=/var/log/elasticsearch/hs_err_pid%p.log",
"-Xlog:gc*,gc+age=trace,safepoint:file=/var/log/elasticsearch/gc.log:utctime,pid,tags:filecount=32,filesize=64m",
"-Djava.locale.providers=COMPAT",
"-XX:UseAVX=0",
"-XX:MaxDirectMemorySize=1073741824",
"-Des.path.home=/usr/share/elasticsearch",
"-Des.path.conf=/etc/elasticsearch",
"-Des.distribution.flavor=default",
"-Des.distribution.type=deb",
"-Des.bundled_jdk=true"
]
},
"thread_pool" : {
"force_merge" : {
"type" : "fixed",
"size" : 1,
"queue_size" : -1
},
"ml_datafeed" : {
"type" : "scaling",
"core" : 1,
"max" : 512,
"keep_alive" : "1m",
"queue_size" : -1
},
"searchable_snapshots_cache_fetch_async" : {
"type" : "scaling",
"core" : 0,
"max" : 32,
"keep_alive" : "30s",
"queue_size" : -1
},
"fetch_shard_started" : {
"type" : "scaling",
"core" : 1,
"max" : 32,
"keep_alive" : "5m",
"queue_size" : -1
},
"listener" : {
"type" : "fixed",
"size" : 8,
"queue_size" : -1
},
"rollup_indexing" : {
"type" : "fixed",
"size" : 4,
"queue_size" : 4
},
"search" : {
"type" : "fixed_auto_queue_size",
"size" : 25,
"queue_size" : 1000
},
"security-crypto" : {
"type" : "fixed",
"size" : 8,
"queue_size" : 1000
},
"ccr" : {
"type" : "fixed",
"size" : 32,
"queue_size" : 100
},
"flush" : {
"type" : "scaling",
"core" : 1,
"max" : 5,
"keep_alive" : "5m",
"queue_size" : -1
},
"fetch_shard_store" : {
"type" : "scaling",
"core" : 1,
"max" : 32,
"keep_alive" : "5m",
"queue_size" : -1
},
"ml_utility" : {
"type" : "scaling",
"core" : 1,
"max" : 2048,
"keep_alive" : "10m",
"queue_size" : -1
},
"get" : {
"type" : "fixed",
"size" : 16,
"queue_size" : 1000
},
"system_read" : {
"type" : "fixed",
"size" : 5,
"queue_size" : 2000
},
"transform_indexing" : {
"type" : "fixed",
"size" : 4,
"queue_size" : 4
},
"write" : {
"type" : "fixed",
"size" : 16,
"queue_size" : 10000
},
"watcher" : {
"type" : "fixed",
"size" : 50,
"queue_size" : 1000
},
"security-token-key" : {
"type" : "fixed",
"size" : 1,
"queue_size" : 1000
},
"refresh" : {
"type" : "scaling",
"core" : 1,
"max" : 8,
"keep_alive" : "5m",
"queue_size" : -1
},
"system_write" : {
"type" : "fixed",
"size" : 5,
"queue_size" : 1000
},
"generic" : {
"type" : "scaling",
"core" : 4,
"max" : 128,
"keep_alive" : "30s",
"queue_size" : -1
},
"warmer" : {
"type" : "scaling",
"core" : 1,
"max" : 5,
"keep_alive" : "5m",
"queue_size" : -1
},
"management" : {
"type" : "scaling",
"core" : 1,
"max" : 5,
"keep_alive" : "5m",
"queue_size" : -1
},
"analyze" : {
"type" : "fixed",
"size" : 1,
"queue_size" : 16
},
"searchable_snapshots_cache_prewarming" : {
"type" : "scaling",
"core" : 0,
"max" : 32,
"keep_alive" : "30s",
"queue_size" : -1
},
"ml_job_comms" : {
"type" : "scaling",
"core" : 4,
"max" : 2048,
"keep_alive" : "1m",
"queue_size" : -1
},
"snapshot" : {
"type" : "scaling",
"core" : 1,
"max" : 5,
"keep_alive" : "5m",
"queue_size" : -1
},
"search_throttled" : {
"type" : "fixed_auto_queue_size",
"size" : 1,
"queue_size" : 100
}
},
"transport" : {
"bound_address" : [
"[::1]:9300",
"127.0.0.1:9300"
],
"publish_address" : "127.0.0.1:9300",
"profiles" : { }
},
"http" : {
"bound_address" : [
"[::1]:9200",
"127.0.0.1:9200"
],
"publish_address" : "127.0.0.1:9200",
"max_content_length_in_bytes" : 104857600
},
"plugins" : [ ],
"modules" : [
{
"name" : "aggs-matrix-stats",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Adds aggregations whose input are a list of numeric fields and output includes a matrix.",
"classname" : "org.elasticsearch.search.aggregations.matrix.MatrixAggregationPlugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "analysis-common",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Adds \"built in\" analyzers to Elasticsearch.",
"classname" : "org.elasticsearch.analysis.common.CommonAnalysisPlugin",
"extended_plugins" : [
"lang-painless"
],
"has_native_controller" : false
},
{
"name" : "constant-keyword",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Module for the constant-keyword field type, which is a specialization of keyword for the case when all documents have the same value.",
"classname" : "org.elasticsearch.xpack.constantkeyword.ConstantKeywordMapperPlugin",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "flattened",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Module for the flattened field type, which allows JSON objects to be flattened into a single field.",
"classname" : "org.elasticsearch.xpack.flattened.FlattenedMapperPlugin",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "frozen-indices",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "A plugin for the frozen indices functionality",
"classname" : "org.elasticsearch.xpack.frozen.FrozenIndices",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "ingest-common",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Module for ingest processors that do not require additional security permissions or have large dependencies and resources",
"classname" : "org.elasticsearch.ingest.common.IngestCommonPlugin",
"extended_plugins" : [
"lang-painless"
],
"has_native_controller" : false
},
{
"name" : "ingest-geoip",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Ingest processor that uses looksup geo data based on ip adresses using the Maxmind geo database",
"classname" : "org.elasticsearch.ingest.geoip.IngestGeoIpPlugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "ingest-user-agent",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Ingest processor that extracts information from a user agent",
"classname" : "org.elasticsearch.ingest.useragent.IngestUserAgentPlugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "kibana",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Plugin exposing APIs for Kibana system indices",
"classname" : "org.elasticsearch.kibana.KibanaPlugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "lang-expression",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Lucene expressions integration for Elasticsearch",
"classname" : "org.elasticsearch.script.expression.ExpressionPlugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "lang-mustache",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Mustache scripting integration for Elasticsearch",
"classname" : "org.elasticsearch.script.mustache.MustachePlugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "lang-painless",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "An easy, safe and fast scripting language for Elasticsearch",
"classname" : "org.elasticsearch.painless.PainlessPlugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "mapper-extras",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Adds advanced field mappers",
"classname" : "org.elasticsearch.index.mapper.MapperExtrasPlugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "mapper-version",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "A plugin for a field type to store sofware versions",
"classname" : "org.elasticsearch.xpack.versionfield.VersionFieldPlugin",
"extended_plugins" : [
"x-pack-core",
"lang-painless"
],
"has_native_controller" : false
},
{
"name" : "parent-join",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "This module adds the support parent-child queries and aggregations",
"classname" : "org.elasticsearch.join.ParentJoinPlugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "percolator",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Percolator module adds capability to index queries and query these queries by specifying documents",
"classname" : "org.elasticsearch.percolator.PercolatorPlugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "rank-eval",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "The Rank Eval module adds APIs to evaluate ranking quality.",
"classname" : "org.elasticsearch.index.rankeval.RankEvalPlugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "reindex",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "The Reindex module adds APIs to reindex from one index to another or update documents in place.",
"classname" : "org.elasticsearch.index.reindex.ReindexPlugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "repositories-metering-api",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Repositories metering API",
"classname" : "org.elasticsearch.xpack.repositories.metering.RepositoriesMeteringPlugin",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "repository-url",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Module for URL repository",
"classname" : "org.elasticsearch.plugin.repository.url.URLRepositoryPlugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "search-business-rules",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "A plugin for applying business rules to search result rankings",
"classname" : "org.elasticsearch.xpack.searchbusinessrules.SearchBusinessRules",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "searchable-snapshots",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "A plugin for the searchable snapshots functionality",
"classname" : "org.elasticsearch.xpack.searchablesnapshots.SearchableSnapshots",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "spatial",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "A plugin for Basic Spatial features",
"classname" : "org.elasticsearch.xpack.spatial.SpatialPlugin",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "systemd",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Integrates Elasticsearch with systemd",
"classname" : "org.elasticsearch.systemd.SystemdPlugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "transform",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "A plugin to transform data",
"classname" : "org.elasticsearch.xpack.transform.Transform",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "transport-netty4",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Netty 4 based transport implementation",
"classname" : "org.elasticsearch.transport.Netty4Plugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "unsigned-long",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Module for the unsigned long field type",
"classname" : "org.elasticsearch.xpack.unsignedlong.UnsignedLongMapperPlugin",
"extended_plugins" : [
"x-pack-core",
"lang-painless"
],
"has_native_controller" : false
},
{
"name" : "vectors",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "A plugin for working with vectors",
"classname" : "org.elasticsearch.xpack.vectors.Vectors",
"extended_plugins" : [
"x-pack-core",
"lang-painless"
],
"has_native_controller" : false
},
{
"name" : "wildcard",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "A plugin for a keyword field type with efficient wildcard search",
"classname" : "org.elasticsearch.xpack.wildcard.Wildcard",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "x-pack-analytics",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Elasticsearch Expanded Pack Plugin - Analytics",
"classname" : "org.elasticsearch.xpack.analytics.AnalyticsPlugin",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "x-pack-async",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "A module which handles common async operations",
"classname" : "org.elasticsearch.xpack.async.AsyncResultsIndexPlugin",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "x-pack-async-search",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "A module which allows to track the progress of a search asynchronously.",
"classname" : "org.elasticsearch.xpack.search.AsyncSearch",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "x-pack-autoscaling",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Elasticsearch Expanded Pack Plugin - Autoscaling",
"classname" : "org.elasticsearch.xpack.autoscaling.Autoscaling",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "x-pack-ccr",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Elasticsearch Expanded Pack Plugin - CCR",
"classname" : "org.elasticsearch.xpack.ccr.Ccr",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "x-pack-core",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Elasticsearch Expanded Pack Plugin - Core",
"classname" : "org.elasticsearch.xpack.core.XPackPlugin",
"extended_plugins" : [ ],
"has_native_controller" : false
},
{
"name" : "x-pack-data-streams",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Elasticsearch Expanded Pack Plugin - Data Streams",
"classname" : "org.elasticsearch.xpack.datastreams.DataStreamsPlugin",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "x-pack-deprecation",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Elasticsearch Expanded Pack Plugin - Deprecation",
"classname" : "org.elasticsearch.xpack.deprecation.Deprecation",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "x-pack-enrich",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Elasticsearch Expanded Pack Plugin - Enrich",
"classname" : "org.elasticsearch.xpack.enrich.EnrichPlugin",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "x-pack-eql",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "The Elasticsearch plugin that powers EQL for Elasticsearch",
"classname" : "org.elasticsearch.xpack.eql.plugin.EqlPlugin",
"extended_plugins" : [
"x-pack-ql",
"lang-painless"
],
"has_native_controller" : false
},
{
"name" : "x-pack-graph",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Elasticsearch Expanded Pack Plugin - Graph",
"classname" : "org.elasticsearch.xpack.graph.Graph",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "x-pack-identity-provider",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Elasticsearch Expanded Pack Plugin - Identity Provider",
"classname" : "org.elasticsearch.xpack.idp.IdentityProviderPlugin",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "x-pack-ilm",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Elasticsearch Expanded Pack Plugin - Index Lifecycle Management",
"classname" : "org.elasticsearch.xpack.ilm.IndexLifecycle",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "x-pack-logstash",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Elasticsearch Expanded Pack Plugin - Logstash",
"classname" : "org.elasticsearch.xpack.logstash.Logstash",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "x-pack-ml",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Elasticsearch Expanded Pack Plugin - Machine Learning",
"classname" : "org.elasticsearch.xpack.ml.MachineLearning",
"extended_plugins" : [
"x-pack-core",
"lang-painless"
],
"has_native_controller" : true
},
{
"name" : "x-pack-monitoring",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Elasticsearch Expanded Pack Plugin - Monitoring",
"classname" : "org.elasticsearch.xpack.monitoring.Monitoring",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "x-pack-ql",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Elasticsearch infrastructure plugin for EQL and SQL for Elasticsearch",
"classname" : "org.elasticsearch.xpack.ql.plugin.QlPlugin",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "x-pack-rollup",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Elasticsearch Expanded Pack Plugin - Rollup",
"classname" : "org.elasticsearch.xpack.rollup.Rollup",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "x-pack-security",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Elasticsearch Expanded Pack Plugin - Security",
"classname" : "org.elasticsearch.xpack.security.Security",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "x-pack-sql",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "The Elasticsearch plugin that powers SQL for Elasticsearch",
"classname" : "org.elasticsearch.xpack.sql.plugin.SqlPlugin",
"extended_plugins" : [
"x-pack-ql",
"lang-painless"
],
"has_native_controller" : false
},
{
"name" : "x-pack-stack",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Elasticsearch Expanded Pack Plugin - Stack",
"classname" : "org.elasticsearch.xpack.stack.StackPlugin",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "x-pack-voting-only-node",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Elasticsearch Expanded Pack Plugin - Voting-only node",
"classname" : "org.elasticsearch.cluster.coordination.VotingOnlyNodePlugin",
"extended_plugins" : [
"x-pack-core"
],
"has_native_controller" : false
},
{
"name" : "x-pack-watcher",
"version" : "7.10.2",
"elasticsearch_version" : "7.10.2",
"java_version" : "1.8",
"description" : "Elasticsearch Expanded Pack Plugin - Watcher",
"classname" : "org.elasticsearch.xpack.watcher.Watcher",
"extended_plugins" : [
"x-pack-core",
"lang-painless"
],
"has_native_controller" : false
}
],
"ingest" : {
"processors" : [
{
"type" : "append"
},
{
"type" : "bytes"
},
{
"type" : "circle"
},
{
"type" : "convert"
},
{
"type" : "csv"
},
{
"type" : "date"
},
{
"type" : "date_index_name"
},
{
"type" : "dissect"
},
{
"type" : "dot_expander"
},
{
"type" : "drop"
},
{
"type" : "enrich"
},
{
"type" : "fail"
},
{
"type" : "foreach"
},
{
"type" : "geoip"
},
{
"type" : "grok"
},
{
"type" : "gsub"
},
{
"type" : "html_strip"
},
{
"type" : "inference"
},
{
"type" : "join"
},
{
"type" : "json"
},
{
"type" : "kv"
},
{
"type" : "lowercase"
},
{
"type" : "pipeline"
},
{
"type" : "remove"
},
{
"type" : "rename"
},
{
"type" : "script"
},
{
"type" : "set"
},
{
"type" : "set_security_user"
},
{
"type" : "sort"
},
{
"type" : "split"
},
{
"type" : "trim"
},
{
"type" : "uppercase"
},
{
"type" : "urldecode"
},
{
"type" : "user_agent"
}
]
},
"aggregations" : {
"adjacency_matrix" : {
"types" : [
"other"
]
},
"auto_date_histogram" : {
"types" : [
"boolean",
"date",
"numeric"
]
},
"avg" : {
"types" : [
"boolean",
"date",
"histogram",
"numeric"
]
},
"boxplot" : {
"types" : [
"histogram",
"numeric"
]
},
"cardinality" : {
"types" : [
"boolean",
"bytes",
"date",
"geopoint",
"geoshape",
"ip",
"numeric",
"range"
]
},
"children" : {
"types" : [
"other"
]
},
"composite" : {
"types" : [
"other"
]
},
"date_histogram" : {
"types" : [
"boolean",
"date",
"numeric",
"range"
]
},
"date_range" : {
"types" : [
"boolean",
"date",
"numeric"
]
},
"diversified_sampler" : {
"types" : [
"boolean",
"bytes",
"date",
"numeric"
]
},
"extended_stats" : {
"types" : [
"boolean",
"date",
"numeric"
]
},
"filter" : {
"types" : [
"other"
]
},
"filters" : {
"types" : [
"other"
]
},
"geo_bounds" : {
"types" : [
"geopoint",
"geoshape"
]
},
"geo_centroid" : {
"types" : [
"geopoint",
"geoshape"
]
},
"geo_distance" : {
"types" : [
"geopoint"
]
},
"geohash_grid" : {
"types" : [
"geopoint",
"geoshape"
]
},
"geotile_grid" : {
"types" : [
"geopoint",
"geoshape"
]
},
"global" : {
"types" : [
"other"
]
},
"histogram" : {
"types" : [
"boolean",
"date",
"histogram",
"numeric",
"range"
]
},
"ip_range" : {
"types" : [
"ip"
]
},
"matrix_stats" : {
"types" : [
"other"
]
},
"max" : {
"types" : [
"boolean",
"date",
"histogram",
"numeric"
]
},
"median_absolute_deviation" : {
"types" : [
"numeric"
]
},
"min" : {
"types" : [
"boolean",
"date",
"histogram",
"numeric"
]
},
"missing" : {
"types" : [
"boolean",
"bytes",
"date",
"geopoint",
"ip",
"numeric",
"range"
]
},
"nested" : {
"types" : [
"other"
]
},
"parent" : {
"types" : [
"other"
]
},
"percentile_ranks" : {
"types" : [
"boolean",
"date",
"histogram",
"numeric"
]
},
"percentiles" : {
"types" : [
"boolean",
"date",
"histogram",
"numeric"
]
},
"range" : {
"types" : [
"boolean",
"date",
"numeric"
]
},
"rare_terms" : {
"types" : [
"boolean",
"bytes",
"date",
"ip",
"numeric"
]
},
"rate" : {
"types" : [
"boolean",
"numeric"
]
},
"reverse_nested" : {
"types" : [
"other"
]
},
"sampler" : {
"types" : [
"other"
]
},
"scripted_metric" : {
"types" : [
"other"
]
},
"significant_terms" : {
"types" : [
"boolean",
"bytes",
"date",
"ip",
"numeric"
]
},
"significant_text" : {
"types" : [
"other"
]
},
"stats" : {
"types" : [
"boolean",
"date",
"numeric"
]
},
"string_stats" : {
"types" : [
"bytes"
]
},
"sum" : {
"types" : [
"boolean",
"date",
"histogram",
"numeric"
]
},
"t_test" : {
"types" : [
"numeric"
]
},
"terms" : {
"types" : [
"boolean",
"bytes",
"date",
"ip",
"numeric"
]
},
"top_hits" : {
"types" : [
"other"
]
},
"top_metrics" : {
"types" : [
"other"
]
},
"value_count" : {
"types" : [
"boolean",
"bytes",
"date",
"geopoint",
"geoshape",
"histogram",
"ip",
"numeric",
"range"
]
},
"variable_width_histogram" : {
"types" : [
"numeric"
]
},
"weighted_avg" : {
"types" : [
"numeric"
]
}
}
}
}
}
Tweaks
Elasticsearch could use huge amount of RAM. But, I've tested it for thin instances it work even with only 128m
. The main configuration files are located into the directory /etc/elasticsearch/
. You can tweak the amount of Ram in use by tweaking the relevant lines in the file jvm.options
. Note Xms
and Xmx
values must be equal.
sudo nano /etc/elasticsearch/jvm.options
# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space
#-Xms512m
#-Xmx512m
-Xms4g
-Xmx4g
Add restart always directive to the Elasticsearch's systemd unit.
sudo systemctl edit elasticsearch.service
[Service]
# SZS/MLT Tweak
Restart=always
RestartSec=3
To apply the changes use the following commands.
sudo systemctl daemon-reload
sudo systemctl restart elasticsearch.service
systemctl status elasticsearch.service
systemctl cat elasticsearch.service
MediaWiki Setup
The main purpose of this guide is how to setup Elasticsearch to be used by MediaWiki's extension CirrusSearch, so in this section we will describe how to do that. In addition also the extension AdvancedSearch will be installed and configured.
If you have installed the extension PdfHandler (or some other file handling extension) CirrusSearch will show results from the files content – in the configuration below is shown how to boost these results. How to configure extension Translate to use Elasticsearch is decried in the MediaWiki's documentation in the article Translation memories.
Install the Extensions Bundle
First of all you need to install the extensions within the MediaWiki's document root. In the following example is used the approach Download from Git.
IP="/var/www/wiki.example.com" # The DocumentRoot directory of the wiki
OWNER="www-data" # The user that owns the $IP directory
BRANCH="REL1_39" # The MediaWiki's branch in use
cd "$IP/extensions"
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/AdvancedSearch --branch ${BRANCH}
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/Elastica --branch ${BRANCH}
git clone https://gerrit.wikimedia.org/r/mediawiki/extensions/CirrusSearch --branch ${BRANCH}
sudo chown -R ${Owner}:${Owner} Elastica/ CirrusSearch/
for ext in Elastica CirrusSearch; do sudo -u ${Owner} composer install --no-dev; done
LocalSettings.php Configuration
Open the configuration file with your favorite editor and place the following lines at suitable place (the end of the file is good place). In the example below is shown the current configuration of this wiki. After the building of the search index (next section) CirrusSearch should work without the advanced setup. More options are described in the Extension:CirrusSearch page, also some undocumented options could be found within its extension.json
file.
sudo nano "$IP/LocalSettings.php"
## Extension:AdvancedSearch
wfLoadExtension( 'AdvancedSearch' );
$wgAdvancedSearchDeepcatEnabled = false; // https://www.mediawiki.org/wiki/Topic:Uw036nwsilvb6w3t
$wgAdvancedSearchBetaFeature = false; // (enable it by default) https://m.mediawiki.org/wiki/Topic:Upflskaswcvrunka
$wgAdvancedSearchHighlighting = true; // https://www.mediawiki.org/wiki/Manual:Configuration_settings_(alphabetical)
$wgOpenSearchDescriptionLength = 2500; // https://www.mediawiki.org/wiki/Manual:$wgOpenSearchDescriptionLength
## Extension:Elastica
wfLoadExtension( 'Elastica' );
## Extension:CirrusSearch
wfLoadExtension( 'CirrusSearch' );
// $wgDisableSearchUpdate = true;
$wgSearchType = 'CirrusSearch';
$wgDebugLogGroups['CirrusSearch'] = "$IP/cache/CirrusSearch.log";
// $wgCirrusSearchIndexBaseName = 'wiki_db_name'; // https://www.mediawiki.org/wiki/Extension:CirrusSearch#Configuration
// $wgCirrusSearchServers = [ '10.120.201.1' ]; // The address of the Elasticsearch serer if it is not available at 'localhost'
## Extension:CirrusSearch Advanced Setup
$wgCirrusSearchRescoreProfile = 'classic_noboostlinks';
// $wgCirrusSearchFullTextQueryBuilderProfiles = 'perfield_builder';
// $wgCirrusSearchCompletionProfiles = 'normal';
$wgCirrusSearchPhraseSuggestUseText = true;
$wgCirrusSearchCompletionSuggesterHardLimit = 200; // 50
$wgCirrusSearchFragmentSize = 200;
// $wgCirrusExploreSimilarResults = true;
// Give much weight to the "file_text" in order to show
// results from the PDFs content. This requires PdfHandler
$wgCirrusSearchWeights = [
"title" => 20,
"redirect" => 15,
"category" => 8,
"heading" => 5,
"opening_text" => 3,
"text" => 5,
"auxiliary_text" => 15,
"file_text" => 25
];
// https://www.mediawiki.org/wiki/Help:Namespaces#Localisation
$wgCirrusSearchNamespaceWeights = [
"2" => 0.05,
"4" => 0.3,
"6" => 0.2,
"8" => 0.05,
"10" => 0.005,
"12" => 0.2,
"14" => 0.1
];
Build Search Index
How to build and update the CirrusSearch/Elasticsearc index is well described in the documents README and UPGRADE which comes with the extension. Here the important steps related to the index building for a first time are extracted and converted to a script – for a single wiki and for a wiki family.
sudo nano /usr/local/bin/"mlw-cirrussearch-elasticsearch-build-index-single-wiki.sh"
#!/bin/bash
# @author Spas Z. Spasov <spas.z.spasov@metalevel.tech>
# @copyright 2022 Spas Z. Spasov
# @license https://www.gnu.org/licenses/gpl-3.0.html GNU General Public License, version 3 (or later)
#
# @source https://phabricator.wikimedia.org/source/extension-cirrussearch/browse/master/README
# @home https://wiki.metalevel.tech/wiki/Elasticsearch_and_MediaWiki_CirrusSearch
# @install Create executable file within and place this code as its content:
# /usr/local/bin/mlw-cirrussearch-elasticsearch-build-index-single-wiki.sh
#
# @desc Create elastic search index for a singel MediaWiki
: ${IP:="/var/www/wiki.example.com"} # The DocumentRoot directory of the wiki
: ${OWNER:="www-data"} # The user that owns the $IP directory
CS_MAINT_DIR="${IP}/extensions/CirrusSearch/maintenance"
## STEP 1
printf -- '\n**\n%s for Wiki:"%s" \n*\n\n' "Disable CirrusSearch amd Search update" "${IP##*/}"
echo; sleep 5
sudo -u "$OWNER" sed -i 's#^$wgSearchType#// $wgSearchType#' "${IP}/LocalSettings.php"
sudo -u "$OWNER" sed -i 's#^// $wgDisableSearchUpdate#$wgDisableSearchUpdate#' "${IP}/LocalSettings.php"
printf -- '\n**\n%s\n*\n' "LocalSettings.php Audit"
sudo -u "$OWNER" grep '$wgSearchType\|$wgDisableSearchUpdate = true' "${IP}/LocalSettings.php"
echo; sleep 5
## STEP 2
printf -- '\n**\n%s for Wiki:"%s" \n*\n\n' "Generate Elasticsearch Index" "${IP##*/}"
echo; sleep 5
sudo -u "$OWNER" /usr/bin/php "${CS_MAINT_DIR}/UpdateSearchIndexConfig.php" --startOver --conf "${IP}/LocalSettings.php"
## STEP 3
printf -- '\n**\n%s for Wiki:"%s" \n*\n\n' "Enable Search Update" "${IP##*/}"
echo; sleep 5
sudo -u "$OWNER" sed -i 's#^$wgDisableSearchUpdate#// $wgDisableSearchUpdate#' "${IP}/LocalSettings.php"
printf -- '\n**\n%s\n*\n' "LocalSettings.php Audit"
sudo -u "$OWNER" grep '$wgSearchType\|$wgDisableSearchUpdate = true' "${IP}/LocalSettings.php"
echo; sleep 5
## STEP 4
printf -- '\n**\n%s for Wiki:"%s" \n*\n\n' "Bootstrap the Search Index" "${IP##*/}"
echo; sleep 5
sudo -u "$OWNER" /usr/bin/php "${CS_MAINT_DIR}/ForceSearchIndex.php" --skipLinks --indexOnSkip --conf "${IP}/LocalSettings.php"
sudo -u "$OWNER" /usr/bin/php "${CS_MAINT_DIR}/ForceSearchIndex.php" --skipParse --conf "${IP}/LocalSettings.php"
## STEP 5
printf -- '\n**\n%s for Wiki:"%s" \n*\n\n' "Enable Cirrus Search" "${IP##*/}"
echo; sleep 5
sudo -u "$OWNER" sed -i 's#^// $wgSearchType#$wgSearchType#' "${IP}/LocalSettings.php"
printf -- '\n**\n%s\n*\n' "LocalSettings.php Audit"
sudo -u "$OWNER" grep '$wgSearchType\|$wgDisableSearchUpdate = true' "${IP}/LocalSettings.php"
echo; sleep 5
## Step 6
printf -- '\n**\n%s for Wiki:"%s" \n*\n\n' "Update Cirrus Search Suggestions (if the option is enabled in LocalSettings.php)" "${IP##*/}"
sudo -u "$OWNER" /usr/bin/php "${CS_MAINT_DIR}/UpdateSuggesterIndex.php" --conf "${IP}/LocalSettings.php"
## Step 7 - this is the most time consumption step, you cold skip it and run it later...
printf -- '\n**\n%s for Wiki:"%s" \n*\n\n' "Run Jobs Quiue" "${IP##*/}"
echo; sleep 5
sudo -u "$OWNER" /usr/bin/php "${IP}/maintenance/runJobs.php" --conf "${IP}/LocalSettings.php"
sudo nano /usr/local/bin/"mlw-cirrussearch-elasticsearch-build-index-wiki-family.sh"
#!/bin/bash
# @author Spas Z. Spasov <spas.z.spasov@metalevel.tech>
# @copyright 2022 Spas Z. Spasov
# @license https://www.gnu.org/licenses/gpl-3.0.html GNU General Public License, version 3 (or later)
#
# @source https://phabricator.wikimedia.org/source/extension-cirrussearch/browse/master/README
# @home https://wiki.metalevel.tech/wiki/Elasticsearch_and_MediaWiki_CirrusSearch
# @install Create executable file within and place this code as its content:
# /usr/local/bin/mlw-cirrussearch-elasticsearch-build-index-wiki-family.sh
#
# @desc Create elastic search index for a MediaWiki Family
# Note: In this scenariou the family members share the same DocumentRoot
# and LocalSettings.php file (and Apache2 virtual host configuration).
: ${IP:="/var/www/wiki-family.example.com"} # The DocumentRoot directory of the wiki
: ${OWNER:="www-data"} # The user that owns the $IP directory
: ${WIKI_IDs:="bg" "en" "ru" "commons"} # The user that owns the $IP directory
CS_MAINT_DIR="${IP}/extensions/CirrusSearch/maintenance"
## STEP 1
printf -- '\n**\n%s for Wiki:"%s" \n*\n\n' "Disable CirrusSearch amd Search update" "${IP##*/}"
echo; sleep 5
sudo -u "$OWNER" sed -i 's#^$wgSearchType#// $wgSearchType#' "${IP}/LocalSettings.php"
sudo -u "$OWNER" sed -i 's#^// $wgDisableSearchUpdate#$wgDisableSearchUpdate#' "${IP}/LocalSettings.php"
printf -- '\n**\n%s\n*\n' "LocalSettings.php Audit"
sudo -u "$OWNER" grep '$wgSearchType\|$wgDisableSearchUpdate = true' "${IP}/LocalSettings.php"
echo; sleep 5
## STEP 2
for WIKI_ID in ${WIKI_IDs[@]}
do
printf -- '\n**\n%s for Wiki:"%s" \n*\n\n' "Generate Elasticsearch Index" "${IP##*/}::${WIKI_ID}"
echo; sleep 5
sudo -u "$OWNER" /usr/bin/php "${CS_MAINT_DIR}/UpdateSearchIndexConfig.php" --startOver --wiki "${WIKI_ID}" #--conf "${IP}/LocalSettings.php"
done
## STEP 3
printf -- '\n**\n%s for Wiki:"%s" \n*\n\n' "Enable Search Update" "${IP##*/}"
echo; sleep 5
sudo -u "$OWNER" sed -i 's#^$wgDisableSearchUpdate#// $wgDisableSearchUpdate#' "${IP}/LocalSettings.php"
printf -- '\n**\n%s\n*\n' "LocalSettings.php Audit"
sudo -u "$OWNER" grep '$wgSearchType\|$wgDisableSearchUpdate = true' "${IP}/LocalSettings.php"
echo; sleep 5
## STEP 4
for WIKI_ID in ${WIKI_IDs[@]}
do
printf -- '\n**\n%s for Wiki:"%s" \n*\n\n' "Bootstrap the Search Index" "${IP##*/}::${WIKI_ID}"
echo; sleep 5
sudo -u "$OWNER" /usr/bin/php "${CS_MAINT_DIR}/ForceSearchIndex.php" --skipLinks --indexOnSkip --wiki "${WIKI_ID}" #--conf "${IP}/LocalSettings.php"
sudo -u "$OWNER" /usr/bin/php "${CS_MAINT_DIR}/ForceSearchIndex.php" --skipParse --wiki "${WIKI_ID}" #--conf "${IP}/LocalSettings.php"
sleep 2
done
## STEP 5
printf -- '\n**\n%s for Wiki:"%s" \n*\n\n' "Enable Cirrus Search" "${IP##*/}"
echo; sleep 5
sudo -u "$OWNER" sed -i 's#^// $wgSearchType#$wgSearchType#' "${IP}/LocalSettings.php"
printf -- '\n**\n%s\n*\n' "LocalSettings.php Audit"
sudo -u "$OWNER" grep '$wgSearchType\|$wgDisableSearchUpdate = true' "${IP}/LocalSettings.php"
echo; sleep 5
## Step 6
for WIKI_ID in ${WIKI_IDs[@]}
do
printf -- '\n**\n%s for Wiki:"%s" \n*\n\n' "Update Cirrus Search Suggestions (if the option is enabled in LocalSettings.php)" "${IP##*/}::${WIKI_ID}"
echo; sleep 5
sudo -u "$OWNER" /usr/bin/php "${CS_MAINT_DIR}/UpdateSuggesterIndex.php" --wiki "${WIKI_ID}" #--conf "${IP}/LocalSettings.php"
done
## Step 7 - this is the most time consumption step, you cold skip it and run it later...
for WIKI_ID in ${WIKI_IDs[@]}
do
printf -- '\n**\n%s for Wiki:"%s" \n*\n\n' "Run Jobs Quiue" "${IP##*/}::${WIKI_ID}"
echo; sleep 5
sudo -u "$OWNER" /usr/bin/php "${IP}/maintenance/runJobs.php" --wiki "${WIKI_ID}" #--conf "${IP}/LocalSettings.php"
done
Update
When you are update CirrusSearch or/and Elasticsearch there are several possible cases, which are well described in the CirrsusSearch's Documentation files README and UPGRADE.
One possible way is to use the extensions maintenance script as it is show below, or you can rebuild the entire search index as it is shown above :)
cd "$IP/extensions/CirrusSearch"
sudo -u $OWNER php maintenance/UpdateSearchIndexConfig.php --reindexAndRemoveOk --indexIdentifier now
sudo -u $OWNER php maintenance/Metastore.php --upgrade
To monitor the filesystem changes during the update you can use a command like the following.
sudo watch "du -hs /var/lib/mysql/wiki_id && \
du -hs /var/lib/elasticsearch && \
php /var/www/wiki.metalevel.tech/maintenance/showJobs.php --type cirrusSearchElasticaWrite"
Additional Setup
Access Elasticsearch via SSH Tunnel
Using such approach is suitable only for test purpose, here is a manual how to set-up:
Elasticsearch Watch Scripts
Here are two example scripts that cover the following scenarios: [1] When the Elasticsearch service is used on the same instance where it is used; and [2] When the Elasticsearch service is used on another host (instance) and we must be sure it is available there.
sudo nano /usr/local/bin/"mlw-elasticsearch-watch-local.sh"
#!/bin/bash -e
# @author Spas Z. Spasov <spas.z.spasov@metalevel.tech>
# @copyright 2022 Spas Z. Spasov
# @license https://www.gnu.org/licenses/gpl-3.0.html GNU General Public License, version 3 (or later)
#
# @home https://wiki.metalevel.tech/wiki/Elasticsearch_and_MediaWiki_CirrusSearch
# @install Create executable file within and place this code as its content:
# /usr/local/bin/mlw-elasticsearch-watch-local.sh
#
# @desc Test wheather Elasticsearch is accessible, if not attempt to restar and send email notification
: ${EMAIL_SENDER:="admin@example.com"} # The email address of the reponsible person
: ${EMAIL_ADMIN:="admin@example.com"} # The email address which sends the email
: ${EMAIL_BODY:="/tmp/elasticsearch-watch.email.body"} # Temporary file where the email body will be stored
if /usr/bin/curl 'http://127.0.0.1:9200' 2>&1 | /bin/grep -q 'Connection refused'
then
{
/bin/date
echo
echo "ElasticSearch fail and will be restarted..."
/usr/bin/systemctl start elasticsearch.service
/usr/bin/systemctl restart elasticsearch.service
} > "$EMAIL_BODY" 2>&1
/usr/bin/mail -r "ElasticSearch Watch ${EMAIL_ADMIN}" \
-s "ElasticSearch was Restarted" "${EMAIL_SENDER}" \
-a "MIME-Version: 1.0" -a "Content-Type: text/html; charset=UTF-8" < "$EMAIL_BODY"
fi
sudo nano /usr/local/bin/"mlw-elasticsearch-watch-remote.sh"
#!/bin/bash -e
# @author Spas Z. Spasov <spas.z.spasov@metalevel.tech>
# @copyright 2021 Spas Z. Spasov
# @license https://www.gnu.org/licenses/gpl-3.0.html GNU General Public License, version 3 (or later)
#
# @home https://wiki.metalevel.tech/wiki/Elasticsearch_and_MediaWiki_CirrusSearch
# @install Create executable file within and place this code as its content:
# /usr/local/bin/mlw-elasticsearch-watch-remote.sh
#
# @desc Test wheather Elasticsearch is accessible, if not attempt to restar and send email notification
# Here the test is done via SSH login to a remote instance where Elasticsearch is used
: ${EMAIL_SENDER:="admin@example.com"} # The email address of the reponsible person
: ${EMAIL_ADMIN:="admin@example.com"} # The email address which sends the email
: ${EMAIL_BODY:="/tmp/elasticsearch-watch.email.body"} # Temporary file where the email body will be stored
: ${HOSTNAME:="example.com"} # A hostname defined in the ssh/config file
if /usr/bin/ssh "$HOSTNAME" "curl 'http://127.0.0.1:9200' 2>&1" | /bin/grep -q 'Connection refused'
then
{
/bin/date
echo
echo "ElasticSearch on remote instance - ${HOSTNAME}, and will be restarted..."
/usr/bin/systemctl start autossh-port-forward.service
/usr/bin/systemctl start elasticsearch.service
/usr/bin/systemctl restart autossh-trivictoria.service
/usr/bin/systemctl restart elasticsearch.service
} > "$EMAIL_BODY" 2>&1
/usr/bin/mail -r "ElasticSearch Watch ${EMAIL_ADMIN}" \
-s "ElasticSearch was Restarted" "${EMAIL_SENDER}" \
-a "MIME-Version: 1.0" -a "Content-Type: text/html; charset=UTF-8" < "$EMAIL_BODY"
fi
To run the test periodically you can create simple systemd service and timer. A pretty simple example of this approach could be found within the Gunicorn's documentation. Another way is to create a crontab
entry as the follow.
sudo crontab -e
# ElasticSearch Watch
*/5 * * * * /usr/local/bin/elasticsearch-watch.sh
Disable CirrusSearch via PHP if Elasticsearch is not available
Another thing that could be done in order to be sure you wiki works correct is to check whether Elasticsearch is available within LocalSettings.php
. This could be done by an implementation of the following code.
<?php
function isPortOpen($ipAddress, $portToCheck) {
$fp = @fsockopen($ipAddress, $portToCheck, $errno, $errstr, 0.1);
if (!$fp) {
return false;
} else {
fclose($fp);
return true;
}
}
if (isPortOpen('127.0.0.1', 9300)) {
echo '9300 Open';
} else {
echo '9300 Closed';
}
$wgSearchType = 'CirrusSearch';
The LocalSettings.php
implementation could look like:
if (@fsockopen('127.0.0.1', 9300, $errno, $errstr, 0.1)) {
fclose($fp);
$wgSearchType = 'CirrusSearch';
}
References
- BitLaunch: How to install Elasticsearch on Ubuntu 20.04 LTS
- Computing for Geeks: Install Elasticsearch 6.x on Ubuntu 18.04 LTS
- Media Wiki: Extension:CirrusSearch
- Phabricator: Extension:CirrusSearch
- Media Wiki: CirrusSearch Talk – Java version compatibility
- Mincong's blog: GC in Elasticsearch – Basic information about garbage collection (GC) in Elasticsearch, JVM options, GC logging
- Foojay.io: Handling JDK & GC Options Dynamically in Elasticsearch
- Elasticsearch Documentation: Important Elasticsearch configuration
- Elasticsearch Documentation: GC logging