Always good to check your outputs.
In the trimmomatic step of our pipeline, we use our
catcher function to see whether all samples were processed,
and it suggests (because it should return missing samples) that
everything went fine (i.e. it does not return any sample IDs.)
# again, using the catcher function we wrote in the programme-set up stage.
catcher $filt
# looks okay, but...However, if we check the slrm
log outputs from the trimmomatic step, we will see
a different story:
> ./omm__2__trimmo4.1228821.slrm:slurmstepd: error: *** JOB 1228821 ON adacompute01 CANCELLED AT 2026-06-16T17:09:21 DUE TO TIME LIMIT ***
> ./omm__2__trimmo4.1228846.slrm:slurmstepd: error: *** JOB 1228846 ON adacompute01 CANCELLED AT 2026-06-16T17:36:51 DUE TO TIME LIMIT ***
> ./omm__2__trimmo4.1228851.slrm:slurmstepd: error: *** JOB 1228851 ON adacompute01 CANCELLED AT 2026-06-16T17:39:51 DUE TO TIME LIMIT ***
> ./omm__2__trimmo4.1228853.slrm:slurmstepd: error: *** JOB 1228853 ON adacompute01 CANCELLED AT 2026-06-16T17:43:21 DUE TO TIME LIMIT ***
We have a problem - 4 samples (note that the IDs are not mentioned)
took took longer than we defined in our slurm script, so
the HPC killed those processes. However, because
trimmomatic was running, some files were produced,
and these were picked up by catcher, which is why it looked
like everything was fine. We can assume that these four files are
broken, malformed, incomplete - or all of the above.
So wat do. We need to get those ID’s, increase the time limit on
trimmomatic, and re-submit those 4 sample to SLURM. We
could also consider deleting the incomplete outputs, just to be
safe.
You can do this manually if you like (it would take about three minutes) but for funs’ sake we’re going to script it here (an hour to code, +1 second to run).
# -L option just returns the logfile - we grab the id from that.
grep -l "rror" ./omm__2*
#
# add that grep into a for-loop and search for samople IDs:
for slrm in $( grep -l "rror" ./omm__2* ) ;
do
grep -E ".*__join\/.*_R1.*" $slrm ;
done
## no fgood solve yet :(
# # challenge: output is long, with multiple instances of that ID. how to parse properly?
# for slrm in $( grep -l "rror" ./omm__2* ) ;
# do
# grep -E ".*__join\/.*_R1.*" $slrm | sed -E 's/.*__join\/(.*)_R1.*/########\1#######/g' ;
# doneCopy and paste and write it down. No shame.
vaginal-69-Visit-1_S36
vaginal-95-Visit-1_S45
vaginal-98-Visit-3_S96
vaginal-9-Visit-3_S56
First, inspect and then delete the broken versions:
# always inspect before deleting something - just to be sure.
lk $filt/{vaginal-69-Visit-1_S36,vaginal-95-Visit-1_S45,vaginal-98-Visit-3_S96,vaginal-9-Visit-3_S56}*
# Note: the $filt/{a,b,c,d}* format used finds any file matching that format. (a bash trick called globbing)
# if youre sure, delete them!
rm $filt/{vaginal-69-Visit-1_S36,vaginal-95-Visit-1_S45,vaginal-98-Visit-3_S96,vaginal-9-Visit-3_S56}*then increase the max time limit on the trimmo4.sh
script
then refire with SLURM:
for id in vaginal-69-Visit-1_S36 vaginal-95-Visit-1_S45 vaginal-98-Visit-3_S96 vaginal-9-Visit-3_S56 ;
do
sbatch $mat/${proj}__slurm__trimmo4.sh $id $join $filt ;
doneSeems to work fine…
under construction and solutions can go here in the future.
FASTQunder construction and solutions can go here in the future.