Commit 5e2baf54 authored by Yarny0's avatar Yarny0
Browse files

nixos/test-driver: fix race from filename clash in OCR

There is a race condition
in the new paralleized OCR code.
The race condition got "active" in commit
819d304a (Use futures for OCR parallelization),
however, the underlying bug already slipped in with commit
e6ea13f4 (User proper `Path` instead of `str` in OCR code).

The OCR module applies tesseract to at most three variants
of the screenshot: the original one, and two variants that
are created by a preprocessing step (with ImageMagick).
The preprocessing step needs an output filename
that is used to write the preprocessed image file.

The "Path" commit broke the way the output file is named:
The code still attempts to append a ".negative" to *one*
of the preprocessed output files, but the method
`.with_suffix` is not suitable for that purpose:
Lateron, ".png" is also added with `.with_suffix`,
*replacing* the ".negative" and thereby yielding the
*the same* output filename for both preprocessed files.

Without parallelization, this doesn't hurt;
preprocessed files are simply created and analyzed in order.
But the parallelization commit
causes that these two tasks now run in parallel
(plus the third task that analyses the original screensshot,
but that does not cause any further harm here):

* Task 1: preprocess (non-negative), then tesseract the output
* Task 2: preprocess (negative), then tesseract the output

Both tasks use the same filename and thus the same file for the
preprocessed image that is generated, then used by tesseract.
This often creates a garbage file since both
preprocessings write that one file at the same time.
Tesseract consequently fails and
complains about bad data in its input file.

The commit at hand simply fixes the file naming
by adding ".negative.png" or ".positive.png"
to the filename for the preprocessed image.
This ensures both threads no longer hurt each
other's data and can now coexist in peace.
parent 605cfcce
Loading
Loading
Loading
Loading
+3 −2
Original line number Diff line number Diff line
@@ -104,7 +104,9 @@ def _preprocess_screenshot(screenshot_path: Path, negate: bool = False) -> Path:

    if negate:
        magick_args.append("-negate")
        out_file = out_file.with_suffix(".negative")
        out_file = out_file.with_name(f"{out_file.stem}.negative.png")
    else:
        out_file = out_file.with_name(f"{out_file.stem}.positive.png")

    magick_args += [
        "-gamma",
@@ -112,7 +114,6 @@ def _preprocess_screenshot(screenshot_path: Path, negate: bool = False) -> Path:
        "-blur",
        "1x65535",
    ]
    out_file = out_file.with_suffix(".png")

    ret = subprocess.run(
        ["magick", "convert"] + magick_args + [screenshot_path, out_file],