MediaWiki Job Queue

From WikiMLT

I'm writ­ing this ar­ti­cle at the time while mi­grat­ing this wi­ki from Me­di­aWi­ki ver­sion 1.38 to ver­sion 1.39. Ac­cord­ing to the Cir­rusSearch extension's page and I was in need to mi­grate from Elas­tic­search ver­sion 6.8.23 to ver­sion 7.10.2.

So af­ter in­stalling Elas­tic­search ver­sion 7.10.2, in­stead fol­low­ing the Up­grade man­u­al I've Re­build the Elas­tic­search da­ta from scratch which leads me to an in­fi­nite MediaWiki's Job queue. For this rea­son I was need to get much fa­mil­iar with the MediaWiki's main­te­nance scripts re­lat­ed the Job queue.

Envvars

The en­vi­ron­ment vari­ables used in the fol­low­ing com­mands.

IP="/var/www/wiki.metalevel.tech" # The DocumentRoot directory of the wiki
OWNER="www-data"                  # The user that owns the $IP directory

Note, in the ex­am­ples be­low, ${IP##*/} is is used in­stead the name of the cer­tain wi­ki.

Show Jobs

sudo -u ${OWNER} php "${IP}/maintenance/showJobs.php"
15    # The count of the all pending jobs
sudo -u ${OWNER} php "${IP}/maintenance/showJobs.php" --group
cirrusSearchLinksUpdatePrioritized: 2 queued; 0 claimed (0 active, 0 abandoned); 0 delayed
htmlCacheUpdate: 4 queued; 0 claimed (0 active, 0 abandoned); 0 delayed
recentChangesUpdate: 1 queued; 0 claimed (0 active, 0 abandoned); 0 delayed
refreshLinksDynamic: 7 queued; 0 claimed (0 active, 0 abandoned); 0 delayed
sudo -u ${OWNER} php "${IP}/maintenance/showJobs.php" --group
cirrusSearchLinksUpdatePrioritized MediaWiki_Job_Queue addedLinks=[] removedLinks=[] prioritize=1 cluster= namespace=0 title=MediaWiki_Job_Queue requestId=ZASOXkzdk5n0fxsD5JVbLgAAQRA (id=6794955,timestamp=20230305124311) status=unclaimed
...
htmlCacheUpdate MediaWiki_Job_Queue table=templatelinks recursive=1 rootJobIsSelf=1 rootJobSignature=fcd541151cba4e3ac5b300da797a3163c93407bc rootJobTimestamp=20230305124311 causeAction=page-edit namespace=0 title=MediaWiki_Job_Queue requestId=ZASOXkzdk5n0fxsD5JVbLgAAQRA causeAgent=unknown (id=6794956,timestamp=20230305124311) status=unclaimed
...
recentChangesUpdate Special:RecentChanges type=cacheUpdate namespace=-1 title=RecentChanges requestId=ZASOUUzdk5n0fxsD5JVa-QAAUQs (id=6794954,timestamp=20230305124307) status=unclaimed
refreshLinksDynamic Kali_Linux_Install_GUFW_(gui-ufw) isOpportunistic=1 rootJobTimestamp=20230305121652 namespace=0 title=Kali_Linux_Install_GUFW_(gui-ufw) requestId=ZASIM9BBdr0qUp1CuMirLwAAABA causeAction=unknown causeAgent=unknown (id=6794945,timestamp=20230305121653) status=unclaimed
...
sudo -u ${OWNER} php "${IP}/maintenance/showJobs.php" --type cirrusSearchLinksUpdatePrioritized
2    # The count of the pending jobs for the specified --type

If it is a wi­ki fam­i­ly you may need to spec­i­fy the $wiki­Id (in most cas­es the val­ue is the same as $wgDB­name) like be­low.

sudo -u ${OWNER} php "${IP}/maintenance/showJobs.php" --wiki="${wikiId}"

Run Jobs

For au­tomat­ing see the sec­tion Job Queue Ser­vice and Job Queue Cron Job be­low. For man­u­al trig­ger you can use the fol­low­ing com­mand.

sudo -u ${OWNER} php "${IP}/maintenance/showJobs.php" --conf "${IP}/LocalSettings.php" --maxjobs=2000

All avail­able op­tions are list­ed on Man­u­al:runJobs.php.

Man­age Jobs

Re-push aban­doned jobs of cer­tain type.

sudo -u ${OWNER} php "${IP}/maintenance/manageJobs.php" --type typeName --action "repush-abandoned"

Delete Jobs of cer­tain type.

sudo -u ${OWNER} php "${IP}/maintenance/manageJobs.php" --type typeName --action "delete"

Use showJobs.php –group, as it is shown above, to find avail­able job types in the cur­rent job queue. Note the quote marks at the ac­tion op­tions in the ex­am­ples above are used here just for bet­ter high­light.

Job Queue Ser­vice

This sec­tion is based on MediaWiki's Man­u­al:Job queue. See al­so Freedesktop​.org > Systemd.service.

1. Cre­ate and make ex­e­cutable the fol­low­ing script.

/usr/local/bin/mlw-service-runJobs-for-${IP##*/}.sh
#!/bin/bash

# @author    Spas Z. Spasov <spas.z.spasov@metalevel.tech>
# @copyright 2023 Spas Z. Spasov
# @license   https://www.gnu.org/licenses/gpl-3.0.html GNU General Public License, version 3 (or later)
#
# @name 	 mlw-service-runJobs-for-${IP##*/}.sh
#
# @desc 	 Run the job queue "$IP/maintenance/runJobs.php" for a specific (single) wiki.
#            The script is designed to be used as a service,
#            so it will run in a loop and will not exit until the server is restarted.
#
# @reference https://wiki.metalevel.tech/wiki/MediaWiki_Job_Queue

: ${IP:="/var/www/wiki.metalevel.tech"} # The DocumentRoot directory of the wiki
: ${OWNER:="www-data"}                  # The user that owns the $IP directory
# The log file, it will be used only if the script is called with "log" argument
: ${LOG_FILE:="/tmp/mlw-service-runJobs-${IP##*/}.log"}

: ${RUN_JOBS_SLEEP:=20}   # The number of seconds to wait before the next loop
: ${RUN_JOBS_MAXJOBS:=20} # The number of jobs to run on each loop
: ${RUN_JOBS_MEM:="max"}  # The memory limit for each job

if [[ $1 == "log" ]]; then
    LOG="$LOG_FILE"
else
    LOG="/dev/null"
fi

SCRIPT_UID=$(id -u $OWNER)
[[ $EUID -ne $SCRIPT_UID ]] && {
    CMD_PREFIX="sudo -u $OWNER"
} || {
    CMD_PREFIX=""
}

RUN_JOBS_CMD="$CMD_PREFIX /usr/bin/php -dmemory_limit=-1 $IP/maintenance/runJobs.php --conf $IP/LocalSettings.php --memory-limit=$RUN_JOBS_MEM"
RUN_JOBS_WIKI="WIKI: ${IP##*/}"
PENDING_JOBS_CMD="$CMD_PREFIX /usr/bin/php $IP/maintenance/showJobs.php --conf $IP/LocalSettings.php"

function runJobs() {
    printf -- '\n*\n*\n* %s - RunJobs started at %s ------------\n*\n\n' "$RUN_JOBS_WIKI" "$(date +%Y-%m-%d_%Hh:%Mm)"

    while true; do
        # Job types that need to be run ASAP no matter how many of them are in the queue. Those jobs should be very "cheap" to run.
        printf -- '\n\n*\n* Execute the urgent jobs at %s ... Pending jobs: %s ...\n*\n\n' "$(date +%Y-%m-%d_%Hh:%Mm)" "$($PENDING_JOBS_CMD || echo 0)"
        $RUN_JOBS_CMD --type="enotifNotify"
        # Everything else, limit the number of jobs on each batch
        # The --wait parameter will pause the execution here until new jobs are added,
        # to avoid running the loop without anything to do
        printf -- '\n\n*\n* Wait for jobs. Run max %s jobs at once then proceed ... Date: %s ... Pending jobs: %s ...\n*\n\n' "$RUN_JOBS_MAXJOBS" "$(date +%Y-%m-%d_%Hh:%Mm)" "$($PENDING_JOBS_CMD || echo 0)"
        $RUN_JOBS_CMD --wait --maxjobs=$RUN_JOBS_MAXJOBS
        # Wait some seconds to let the CPU do other things, like handling web requests, etc
        printf -- '\n\n*\n* Waiting for %s seconds at %s ... Pending jobs: %s ...\n*\n\n' "$RUN_JOBS_SLEEP" "$(date +%Y-%m-%d_%Hh:%Mm)" "$($PENDING_JOBS_CMD || echo 0)"
        sleep $RUN_JOBS_SLEEP
    done
}

# Wait a minute after the server starts up
# to give other processes time to get started,
# then start the job queue loop.
sleep 60
runJobs 2>&1 >"$LOG"

2. Cre­ate the sys­temd ser­vice unit. Tweak the val­ues of User=(www-data|php-fpm), ${IP##*/} (which in the ex­am­ples stands for the name of the wi­ki), the "log" op­tion, in most cas­es you do not need it.

/etc/systemd/system/mlw-service-runJobs-for-${IP##*/}.service
[Unit]
Description=MediaWiki Job Runner for ${IP##*/}

[Service]
ExecStart=/usr/local/bin/mlw-service-runJobs-for-${IP##*/}.sh "log"
Nice=10
ProtectSystem=full
User=www-data
OOMScoreAdjust=200
StandardOutput=journal

[Install]
WantedBy=multi-user.target

En­able and start the sys­temd ser­vice.

sudo systemctl enable --now mlw-service-runJobs-for-${IP##*/}.service
sudo systemctl status mlw-service-runJobs-for-${IP##*/}.service

Now if the "log" op­tion is en­abled you can tail the out­put log file.

tail -f /tmp/mlw-service-runJobs-${IP##*/}.log

Job Queue Cron Job

Be­fore writ­ing this ar­ti­cle and made the nec­es­sary in­ves­ti­ga­tion, I was deal with the Job queue by the fol­low­ing script, trig­gered by by a Cron job.

/usr/local/bin/mlw-maintenance-runJobs-${IP##*/}.sh
#!/bin/bash

# @author    Spas Z. Spasov <spas.z.spasov@metalevel.tech>
# @copyright 2022 Spas Z. Spasov
# @license   https://www.gnu.org/licenses/gpl-3.0.html GNU General Public License, version 3 (or later)
#
# @name: 	 /usr/local/bin/mlw-maintenance-runJobs-${IP##*/}.sh
# @desc 	 Run the job queue: $IP/maintenance/runJobs.php
#
# Crontab:
# * * * * * sudo -u www-data /usr/local/bin/mlw-maintenance-runJobs-${IP##*/}.sh    >/var/log/cron.mlw-maintenance-runJobs-${IP##*/}.sh.log 2>&1

: ${IP:="/var/www/wiki.metalevel.tech"} # The DocumentRoot directory of the wiki
: ${OWNER:="www-data"}                  # The user that owns the $IP directory

# Use some argument to activate CLI mode (TEST_DEPTH=2)
# otherwise the script fallback to cron mode (TEST_DEPTH=4)
if [[ -z $1 ]]; then
    TEST_DEPTH=4
else
    TEST_DEPTH=2
fi

echo ''
echo '*'
echo "* week_$(date +%W.%Y-%m-%d_%Hh:%Mm)"
echo '*'

if ps aux | grep -v 'grep' | grep -oq 'mlw-maintenance-rebuild'; then
    echo "Some of our 'mlw-maintenance-rebuild-*.sh' is running, try again later..."
    exit
fi

if [[ "$(ps aux | grep -v 'grep' | grep -c "$0")" -eq "${TEST_DEPTH}" ]]; then
    printf -- '\n*\n*\n* WIKI: %s - RunJobs begin. ------------\n\n' "${IP##*/}"
    #sudo chown -R www-data:www-data $IP/cache
    sudo -u "$OWNER" /usr/bin/php -dmemory_limit=-1 $IP/maintenance/runJobs.php --conf $IP/LocalSettings.php
else
    echo "Another instance of ${0} is running... Skip."
    echo "Use 'cli' as argument to activate the CLI mode otherwise the script fallback to cron mode."
    echo "Test for other instances: ${TEST_DEPTH} ?= $(ps aux | grep -v 'grep' | grep -c "$0")"
fi

Ex­am­ples of the Crontab en­tries – with and with­out log­ging.

sudo crontab -e
*/5 * * * * /usr/local/bin/mlw-maintenance-runJobs-${IP##*/}.sh   >/dev/null 2>&1
#*/5 * * * * /usr/local/bin/mlw-maintenance-runJobs-${IP##*/}.sh  >/var/log/cron.mw-maintenance-runJobs.log 2>&1

Ex­am­ple for a CLI us­age.

mlw-maintenance-runJobs-${IP##*/}.sh cli

Ref­er­ences