MediaWiki Job Queue: Difference between revisions
m (Стадий: 4 [Фаза:Авторизиране, Статус:Разработен]; Категория:MediaWiki) |
m (Стадий: 6 [Фаза:Утвърждаване, Статус:Утвърден]; Категория:MediaWiki) |
||
(One intermediate revision by the same user not shown) | |||
Line 215: | Line 215: | ||
{{devStage | {{devStage | ||
| Прндл = MediaWiki | | Прндл = MediaWiki | ||
| Стадий = | | Стадий = 6 | ||
| Фаза = | | Фаза = Утвърждаване | ||
| Статус = | | Статус = Утвърден | ||
| ИдтПт = Spas | | ИдтПт = Spas | ||
| РзбПт = Spas | | РзбПт = Spas | ||
| АвтПт = | | АвтПт = Spas | ||
| УтвПт = | | УтвПт = {{REVISIONUSER}} | ||
| ИдтДт = 5.03.2023 | | ИдтДт = 5.03.2023 | ||
| РзбДт = 5.03.2023 | | РзбДт = 5.03.2023 | ||
| АвтДт = | | АвтДт = 5.03.2023 | ||
| УтвДт = | | УтвДт = {{Today}} | ||
| ИдтРв = [[Special:Permalink/32351|32351]] | | ИдтРв = [[Special:Permalink/32351|32351]] | ||
| РзбРв = [[Special:Permalink/32380|32380]] | | РзбРв = [[Special:Permalink/32380|32380]] | ||
| АвтРв = | | АвтРв = [[Special:Permalink/32383|32383]] | ||
| РзАРв = [[Special:Permalink/32372|32372]] | | РзАРв = [[Special:Permalink/32372|32372]] | ||
| УтвРв = {{REVISIONID}} | |||
| РзУРв = [[Special:Permalink/32378|32378]] | | РзУРв = [[Special:Permalink/32378|32378]] | ||
}} | }} | ||
</div> | </div> | ||
</noinclude> | </noinclude> |
Latest revision as of 22:31, 5 March 2023
I'm writing this article at the time while migrating this wiki from MediaWiki version 1.38 to version 1.39. According to the CirrusSearch extension's page and I was in need to migrate from Elasticsearch version 6.8.23 to version 7.10.2.
So after installing Elasticsearch version 7.10.2, instead following the Upgrade manual I've Rebuild the Elasticsearch data from scratch which leads me to an infinite MediaWiki's Job queue. For this reason I was need to get much familiar with the MediaWiki's maintenance scripts related the Job queue.
Envvars
The environment variables used in the following commands.
IP="/var/www/wiki.metalevel.tech" # The DocumentRoot directory of the wiki
OWNER="www-data" # The user that owns the $IP directory
Note, in the examples below, ${IP##*/}
is is used instead the name of the certain wiki.
Show Jobs
sudo -u ${OWNER} php "${IP}/maintenance/showJobs.php"
15 # The count of the all pending jobs
sudo -u ${OWNER} php "${IP}/maintenance/showJobs.php" --group
cirrusSearchLinksUpdatePrioritized: 2 queued; 0 claimed (0 active, 0 abandoned); 0 delayed
htmlCacheUpdate: 4 queued; 0 claimed (0 active, 0 abandoned); 0 delayed
recentChangesUpdate: 1 queued; 0 claimed (0 active, 0 abandoned); 0 delayed
refreshLinksDynamic: 7 queued; 0 claimed (0 active, 0 abandoned); 0 delayed
sudo -u ${OWNER} php "${IP}/maintenance/showJobs.php" --group
cirrusSearchLinksUpdatePrioritized MediaWiki_Job_Queue addedLinks=[] removedLinks=[] prioritize=1 cluster= namespace=0 title=MediaWiki_Job_Queue requestId=ZASOXkzdk5n0fxsD5JVbLgAAQRA (id=6794955,timestamp=20230305124311) status=unclaimed
...
htmlCacheUpdate MediaWiki_Job_Queue table=templatelinks recursive=1 rootJobIsSelf=1 rootJobSignature=fcd541151cba4e3ac5b300da797a3163c93407bc rootJobTimestamp=20230305124311 causeAction=page-edit namespace=0 title=MediaWiki_Job_Queue requestId=ZASOXkzdk5n0fxsD5JVbLgAAQRA causeAgent=unknown (id=6794956,timestamp=20230305124311) status=unclaimed
...
recentChangesUpdate Special:RecentChanges type=cacheUpdate namespace=-1 title=RecentChanges requestId=ZASOUUzdk5n0fxsD5JVa-QAAUQs (id=6794954,timestamp=20230305124307) status=unclaimed
refreshLinksDynamic Kali_Linux_Install_GUFW_(gui-ufw) isOpportunistic=1 rootJobTimestamp=20230305121652 namespace=0 title=Kali_Linux_Install_GUFW_(gui-ufw) requestId=ZASIM9BBdr0qUp1CuMirLwAAABA causeAction=unknown causeAgent=unknown (id=6794945,timestamp=20230305121653) status=unclaimed
...
sudo -u ${OWNER} php "${IP}/maintenance/showJobs.php" --type cirrusSearchLinksUpdatePrioritized
2 # The count of the pending jobs for the specified --type
If it is a wiki family you may need to specify the $wikiId
(in most cases the value is the same as $wgDBname
) like below.
sudo -u ${OWNER} php "${IP}/maintenance/showJobs.php" --wiki="${wikiId}"
Run Jobs
For automating see the section Job Queue Service and Job Queue Cron Job below. For manual trigger you can use the following command.
sudo -u ${OWNER} php "${IP}/maintenance/showJobs.php" --conf "${IP}/LocalSettings.php" --maxjobs=2000
All available options are listed on Manual:runJobs.php.
Manage Jobs
Re-push abandoned jobs of certain type.
sudo -u ${OWNER} php "${IP}/maintenance/manageJobs.php" --type typeName --action "repush-abandoned"
Delete Jobs of certain type.
sudo -u ${OWNER} php "${IP}/maintenance/manageJobs.php" --type typeName --action "delete"
Use showJobs.php –group
, as it is shown above, to find available job types in the current job queue. Note the quote marks at the action options in the examples above are used here just for better highlight.
Job Queue Service
This section is based on MediaWiki's Manual:Job queue. See also Freedesktop.org > Systemd.service.
1. Create and make executable the following script.
/usr/local/bin/mlw-service-runJobs-for-${IP##*/}.sh
#!/bin/bash
# @author Spas Z. Spasov <spas.z.spasov@metalevel.tech>
# @copyright 2023 Spas Z. Spasov
# @license https://www.gnu.org/licenses/gpl-3.0.html GNU General Public License, version 3 (or later)
#
# @name mlw-service-runJobs-for-${IP##*/}.sh
#
# @desc Run the job queue "$IP/maintenance/runJobs.php" for a specific (single) wiki.
# The script is designed to be used as a service,
# so it will run in a loop and will not exit until the server is restarted.
#
# @reference https://wiki.metalevel.tech/wiki/MediaWiki_Job_Queue
: ${IP:="/var/www/wiki.metalevel.tech"} # The DocumentRoot directory of the wiki
: ${OWNER:="www-data"} # The user that owns the $IP directory
# The log file, it will be used only if the script is called with "log" argument
: ${LOG_FILE:="/tmp/mlw-service-runJobs-${IP##*/}.log"}
: ${RUN_JOBS_SLEEP:=20} # The number of seconds to wait before the next loop
: ${RUN_JOBS_MAXJOBS:=20} # The number of jobs to run on each loop
: ${RUN_JOBS_MEM:="max"} # The memory limit for each job
if [[ $1 == "log" ]]; then
LOG="$LOG_FILE"
else
LOG="/dev/null"
fi
SCRIPT_UID=$(id -u $OWNER)
[[ $EUID -ne $SCRIPT_UID ]] && {
CMD_PREFIX="sudo -u $OWNER"
} || {
CMD_PREFIX=""
}
RUN_JOBS_CMD="$CMD_PREFIX /usr/bin/php -dmemory_limit=-1 $IP/maintenance/runJobs.php --conf $IP/LocalSettings.php --memory-limit=$RUN_JOBS_MEM"
RUN_JOBS_WIKI="WIKI: ${IP##*/}"
PENDING_JOBS_CMD="$CMD_PREFIX /usr/bin/php $IP/maintenance/showJobs.php --conf $IP/LocalSettings.php"
function runJobs() {
printf -- '\n*\n*\n* %s - RunJobs started at %s ------------\n*\n\n' "$RUN_JOBS_WIKI" "$(date +%Y-%m-%d_%Hh:%Mm)"
while true; do
# Job types that need to be run ASAP no matter how many of them are in the queue. Those jobs should be very "cheap" to run.
printf -- '\n\n*\n* Execute the urgent jobs at %s ... Pending jobs: %s ...\n*\n\n' "$(date +%Y-%m-%d_%Hh:%Mm)" "$($PENDING_JOBS_CMD || echo 0)"
$RUN_JOBS_CMD --type="enotifNotify"
# Everything else, limit the number of jobs on each batch
# The --wait parameter will pause the execution here until new jobs are added,
# to avoid running the loop without anything to do
printf -- '\n\n*\n* Wait for jobs. Run max %s jobs at once then proceed ... Date: %s ... Pending jobs: %s ...\n*\n\n' "$RUN_JOBS_MAXJOBS" "$(date +%Y-%m-%d_%Hh:%Mm)" "$($PENDING_JOBS_CMD || echo 0)"
$RUN_JOBS_CMD --wait --maxjobs=$RUN_JOBS_MAXJOBS
# Wait some seconds to let the CPU do other things, like handling web requests, etc
printf -- '\n\n*\n* Waiting for %s seconds at %s ... Pending jobs: %s ...\n*\n\n' "$RUN_JOBS_SLEEP" "$(date +%Y-%m-%d_%Hh:%Mm)" "$($PENDING_JOBS_CMD || echo 0)"
sleep $RUN_JOBS_SLEEP
done
}
# Wait a minute after the server starts up
# to give other processes time to get started,
# then start the job queue loop.
sleep 60
runJobs 2>&1 >"$LOG"
2. Create the systemd service unit. Tweak the values of User=(www-data|php-fpm)
, ${IP##*/}
(which in the examples stands for the name of the wiki), the "log"
option, in most cases you do not need it.
/etc/systemd/system/mlw-service-runJobs-for-${IP##*/}.service
[Unit]
Description=MediaWiki Job Runner for ${IP##*/}
[Service]
ExecStart=/usr/local/bin/mlw-service-runJobs-for-${IP##*/}.sh "log"
Nice=10
ProtectSystem=full
User=www-data
OOMScoreAdjust=200
StandardOutput=journal
[Install]
WantedBy=multi-user.target
Enable and start the systemd service.
sudo systemctl enable --now mlw-service-runJobs-for-${IP##*/}.service
sudo systemctl status mlw-service-runJobs-for-${IP##*/}.service
Now if the "log" option is enabled you can tail the output log file.
tail -f /tmp/mlw-service-runJobs-${IP##*/}.log
Job Queue Cron Job
Before writing this article and made the necessary investigation, I was deal with the Job queue by the following script, triggered by by a Cron job.
/usr/local/bin/mlw-maintenance-runJobs-${IP##*/}.sh
#!/bin/bash
# @author Spas Z. Spasov <spas.z.spasov@metalevel.tech>
# @copyright 2022 Spas Z. Spasov
# @license https://www.gnu.org/licenses/gpl-3.0.html GNU General Public License, version 3 (or later)
#
# @name: /usr/local/bin/mlw-maintenance-runJobs-${IP##*/}.sh
# @desc Run the job queue: $IP/maintenance/runJobs.php
#
# Crontab:
# * * * * * sudo -u www-data /usr/local/bin/mlw-maintenance-runJobs-${IP##*/}.sh >/var/log/cron.mlw-maintenance-runJobs-${IP##*/}.sh.log 2>&1
: ${IP:="/var/www/wiki.metalevel.tech"} # The DocumentRoot directory of the wiki
: ${OWNER:="www-data"} # The user that owns the $IP directory
# Use some argument to activate CLI mode (TEST_DEPTH=2)
# otherwise the script fallback to cron mode (TEST_DEPTH=4)
if [[ -z $1 ]]; then
TEST_DEPTH=4
else
TEST_DEPTH=2
fi
echo ''
echo '*'
echo "* week_$(date +%W.%Y-%m-%d_%Hh:%Mm)"
echo '*'
if ps aux | grep -v 'grep' | grep -oq 'mlw-maintenance-rebuild'; then
echo "Some of our 'mlw-maintenance-rebuild-*.sh' is running, try again later..."
exit
fi
if [[ "$(ps aux | grep -v 'grep' | grep -c "$0")" -eq "${TEST_DEPTH}" ]]; then
printf -- '\n*\n*\n* WIKI: %s - RunJobs begin. ------------\n\n' "${IP##*/}"
#sudo chown -R www-data:www-data $IP/cache
sudo -u "$OWNER" /usr/bin/php -dmemory_limit=-1 $IP/maintenance/runJobs.php --conf $IP/LocalSettings.php
else
echo "Another instance of ${0} is running... Skip."
echo "Use 'cli' as argument to activate the CLI mode otherwise the script fallback to cron mode."
echo "Test for other instances: ${TEST_DEPTH} ?= $(ps aux | grep -v 'grep' | grep -c "$0")"
fi
Examples of the Crontab entries – with and without logging.
sudo crontab -e
*/5 * * * * /usr/local/bin/mlw-maintenance-runJobs-${IP##*/}.sh >/dev/null 2>&1
#*/5 * * * * /usr/local/bin/mlw-maintenance-runJobs-${IP##*/}.sh >/var/log/cron.mw-maintenance-runJobs.log 2>&1
Example for a CLI usage.
mlw-maintenance-runJobs-${IP##*/}.sh cli
References
- Continuous service examples:
- MediaWiki: Manual:Job queue
- Semantic MediaWiki: Help:Job queue
- MediaWiki: Manual:runJobs.php
- MediaWiki: Manual:showJobs.php
- MediaWiki: Manual:manageJobs.php
- MediaWiki Docs: JobQueue | JobQueue Architecture | Manual:Job queue/For developers
- MediaWiki Support desk: How to remove abandoned jobs from job queue?