I'm working on a function to preprocess images for training, but I've noticed that it starts off quickly and then slows down significantly. The function processes raw images stored in a directory, specifically depth and color images that follow a naming convention of `id_depth.png` and `id_color.png`. I'm using `RAW_ROOT` and `RGBD_ROOT` as path variables. Given this setup, I'm trying to determine if I'm using the most efficient method for processing these images or if there are ways to optimize it further. Here's the code I'm using:
```python
def func() -> None:
depth_fps = cfg.RAW_ROOT.glob("*depth.png") # list of depth filepaths
for depth_fp in depth_fps:
img_stem = depth_fp.stem[:-5]
color_fn = img_stem + "color.png"
color_fp = cfg.RAW_ROOT / color_fn
rgbd_im = make_rgbd(depth_fp, color_fp)
rgbd_fn = img_stem + "rgbd"
np.save(cfg.RGBD_ROOT / rgbd_fn, rgbd_im)
print(f"Save {rgbd_fn}")
print("Finish")
```
2 Answers
You should really consider profiling your code to pinpoint where the slowdown is happening. It's often IO that's the bottleneck when you’re saving files. Initially, it writes fast since it uses the disk cache, but once that's full, you experience delays as it waits for the disk to catch up. Unfortunately, that’s just how it works with disk operations.
It sounds like your write cache is filling up. Try running a timer on each loop to track how long each step is taking; printing isn't the best method for measuring speed, especially for fast functions. Given that you're writing a lot of files quickly, especially if you’re using an older HDD, the write speed could be the issue after the initial burst.

True, but what if there’s a way to avoid writing to the disk entirely? If the process could be handled in-memory or done by whatever is generating these files, that might solve the problem comprehensively.