MediaWiki Job Queue

From WikiMLT

I'm writ­ing this ar­ti­cle at the time while mi­grat­ing this wi­ki from Me­di­aWi­ki ver­sion 1.38 to ver­sion 1.39. Ac­cord­ing to the Cir­rusSearch extension's page and I was in need to mi­grate from Elas­tic­search ver­sion 6.8.23 to ver­sion 7.10.2.

So af­ter in­stalling Elas­tic­search ver­sion 7.10.2, in­stead fol­low­ing the Up­grade man­u­al I've Re­build the Elas­tic­search da­ta from scratch which leads me to an in­fi­nite MediaWiki's Job queue. For this rea­son I was need to get much fa­mil­iar with the MediaWiki's main­te­nance scripts re­lat­ed the Job queue.

Envvars

The en­vi­ron­ment vari­ables used in the fol­low­ing com­mands.

IP="/var/www/wiki.example.com" # The DocumentRoot directory of the wiki
OWNER="www-data"               # The user that owns the $IP directory

Show Jobs

sudo -u ${OWNER} php "${IP}/maintenance/showJobs.php"
15    # The count of the all pending jobs
sudo -u ${OWNER} php "${IP}/maintenance/showJobs.php" --group
cirrusSearchLinksUpdatePrioritized: 2 queued; 0 claimed (0 active, 0 abandoned); 0 delayed
htmlCacheUpdate: 4 queued; 0 claimed (0 active, 0 abandoned); 0 delayed
recentChangesUpdate: 1 queued; 0 claimed (0 active, 0 abandoned); 0 delayed
refreshLinksDynamic: 7 queued; 0 claimed (0 active, 0 abandoned); 0 delayed
sudo -u ${OWNER} php "${IP}/maintenance/showJobs.php" --group
cirrusSearchLinksUpdatePrioritized MediaWiki_Job_Queue addedLinks=[] removedLinks=[] prioritize=1 cluster= namespace=0 title=MediaWiki_Job_Queue requestId=ZASOXkzdk5n0fxsD5JVbLgAAQRA (id=6794955,timestamp=20230305124311) status=unclaimed
...
htmlCacheUpdate MediaWiki_Job_Queue table=templatelinks recursive=1 rootJobIsSelf=1 rootJobSignature=fcd541151cba4e3ac5b300da797a3163c93407bc rootJobTimestamp=20230305124311 causeAction=page-edit namespace=0 title=MediaWiki_Job_Queue requestId=ZASOXkzdk5n0fxsD5JVbLgAAQRA causeAgent=unknown (id=6794956,timestamp=20230305124311) status=unclaimed
...
recentChangesUpdate Special:RecentChanges type=cacheUpdate namespace=-1 title=RecentChanges requestId=ZASOUUzdk5n0fxsD5JVa-QAAUQs (id=6794954,timestamp=20230305124307) status=unclaimed
refreshLinksDynamic Kali_Linux_Install_GUFW_(gui-ufw) isOpportunistic=1 rootJobTimestamp=20230305121652 namespace=0 title=Kali_Linux_Install_GUFW_(gui-ufw) requestId=ZASIM9BBdr0qUp1CuMirLwAAABA causeAction=unknown causeAgent=unknown (id=6794945,timestamp=20230305121653) status=unclaimed
...
sudo -u ${OWNER} php "${IP}/maintenance/showJobs.php" --type cirrusSearchLinksUpdatePrioritized
2    # The count of the pending jobs for the specified --type

If it is a wi­ki fam­i­ly, you may need to spec­i­fy the $wiki­Id, like be­low.

sudo -u ${OWNER} php "${IP}/maintenance/showJobs.php" --wiki="${wikiId}"

Man­age Jobs

Re-push aban­doned jobs of cer­tain type.

sudo -u ${OWNER} php "${IP}/maintenance/manageJobs.php" --type typeName --action "repush-abandoned"

Delete Jobs of cer­tain type.

sudo -u ${OWNER} php "${IP}/maintenance/manageJobs.php" --type typeName --action "delete"

Use showJobs.php –group, as it is shown above, to find avail­able job types in the cur­rent job queue. Note the quote marks at the ac­tion op­tions in the ex­am­ples above are used here just for bet­ter high­light.

Job Queue Ser­vice

Job Queue Cron Job

Be­fore writ­ing this ar­ti­cle and made the nec­es­sary in­ves­ti­ga­tion, I was deal with the Job queue by the fol­low­ing script, trig­gered by by a Cron job.

/usr/local/bin/mlw-maintenance-runJobs-for-${IP##*/}.sh
#!/bin/bash

# @author    Spas Z. Spasov <spas.z.spasov@metalevel.tech>
# @copyright 2022 Spas Z. Spasov
# @license   https://www.gnu.org/licenses/gpl-3.0.html GNU General Public License, version 3 (or later)
#
# @name: 	 /usr/local/bin/mlw-maintenance-runJobs-${IP##*/}.sh
# @desc 	 Run the job queue: $IP/maintenance/runJobs.php
#
# Crontab:
# * * * * * sudo -u www-data /usr/local/bin/mlw-maintenance-runJobs-${IP##*/}.sh    >/var/log/cron.mlw-maintenance-runJobs-${IP##*/}.sh.log 2>&1
# * * * * * sudo -u www-data /usr/local/bin/mlw-maintenance-runJobs-wiki.metalevel.tech.sh    >/var/log/cron.mlw-maintenance-runJobs-wiki.metalevel.tech.sh.log 2>&1

: ${IP:="/var/www/wiki.metalevel.tech"} # The DocumentRoot directory of the wiki
: ${OWNER:="www-data"}                  # The user that owns the $IP directory

# Use some argument to activate CLI mode (TEST_DEPTH=2)
# otherwise the script fallback to cron mode (TEST_DEPTH=4)
if [[ -z $1 ]]; then
    TEST_DEPTH=4
else
    TEST_DEPTH=2
fi

echo ''
echo "*"
echo '*'
echo "* week_$(date +%W.%Y-%m-%d_%Hh:%Mm)"
echo '*'

if ps aux | grep -v 'grep' | grep -oq 'mlw-maintenance-rebuild'; then
    echo "Some of our 'mlw-maintenance-rebuild-*.sh' is running, try again later..."
    exit
fi

if [[ "$(ps aux | grep -v 'grep' | grep -c "$0")" -eq "${TEST_DEPTH}" ]]; then
    printf -- '\n*\n*\n* WIKI: %s - RunJobs begin. ------------\n\n' "${IP##*/}"
    #sudo chown -R www-data:www-data $IP/cache
    sudo -u "$OWNER" /usr/bin/php -dmemory_limit=-1 $IP/maintenance/runJobs.php --conf $IP/LocalSettings.php
else
    echo "Another instance of ${0} is running... Skip. "
    echo "Use 'cli' as argument to activate the CLI mode otherwise the script fallback to cron mode."
    echo "Test for other instances: ${TEST_DEPTH} ?= $(ps aux | grep -v 'grep' | grep -c "$0")"
fi

Ref­er­ences