How can I optimize my Bash script that uses ‘find’ for processing large file lists?

0
18
Asked By CuriousCoder123 On

I'm working on a Bash script that processes a huge number of files, specifically those named "output*.txt". The script currently removes an old result file and then uses the 'find' command to execute a series of commands on each file. My goal is to extract the 6th last and 3rd last line from each file, grabbing the second column from these lines (which is always an integer). However, the current approach creates a new shell for each file, which seems to slow things down. I'm curious if there's a way to improve the performance by loading the file list into a variable, or perhaps using a loop instead of 'find -exec'. Although it only takes about a minute to run, I'm looking for ways to streamline this process. Here's the relevant script:

#!/usr/bin/bash
rm -fv plot.dat
find . -iname "output*.txt" -exec sh -c '
BASE=$(tail -6 < {} | head -n 1 | cut -d " " -f 2)
FAKE=$(tail -3 > plot.dat
' {} ;
sort -k1 -n < plot.dat
echo "All done"

3 Answers

Answered By TailGuru88 On

I suggest letting tail read the files directly instead of piping the input from stdin. Also, if the order of processing doesn't matter, you could explore `xargs` or `GNU parallel` to speed things up. They can both handle multiple files at once and potentially cut down your runtime.

Answered By LoopLover42 On

Why not just store the output of the find command in a variable? You could do something like this:

`files=$(find . -name "output*.txt")`

And then process each file with a for loop:

`for file in $files; do ... done`
It simplifies handling the files without spawning new shells each time!

Answered By ScriptingNerd79 On

You can streamline the process by combining 'find' and 'awk'. Instead of spawning a shell for every file, try this approach:

`find . -name "output*.txt" -exec tail -q -n 6 {} + |
awk 'NR % 6 == 1 { base_val=$2 } NR % 6 == 4 { print base_val, $2 }' >> plot.dat`
This way, you keep everything more efficient, as the `-q` option prevents file headers from appearing, giving you a continuous stream of data to work with.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.