I'm trying to figure out the best method for capturing output from a function in my shell scripts. Should I use command substitution by assigning it to a variable or pass a variable name to the function using the `declare -n` method? Here's the context: I'm calling a function that queries an API, which returns a JSON string that I later parse. I have to do this for four different API endpoints to gather all the information I need. I prefer to keep related data organized in a dictionary. I know it's a bit nitpicky, but I can't decide which approach is better. Should I use `my_dict[json]="$(some_func)"` or call `some_func _my_dict`? Also, is there a significant performance difference when using command substitution because of the subshell it creates?
6 Answers
I'd definitely recommend using command substitution like `foo=$(function)`. The concern about subshell spawning is often overstated unless you're forking a massive number of times in a loop; normally, it's hardly a problem. Just watch out for extra output from your function, because it might mess with your capturing. Plus, if you're dealing with multiple lines as output, that adds complexity too.
If you're getting multiple output lines, try using `mktemp`.
I think it's smart to write your functions to accept both options, similar to how `printf` works. Premature optimization is the root of all evil—don't sweat it too much! You're really only optimizing for microseconds here. Go for what's simplest for you, and move on to the next task.
This is a fantastic idea! I initially checked a return status, but I think it'll be beneficial to have functions that take in options. I haven't done that often, but it's something to consider.
Testing shows that subshells add minimal overhead. If you time iterations, you'll see the difference is negligible unless you're running an extremely high number of calls. Just focus on clarity in your script!
To avoid subshells which can be slower, consider static variable names. If you don't need unique variable names, just create them directly in your function without any special declarations. If you want the variable to be limited to the parent function’s scope, you can use `declare local` to hold the value.
Could you clarify why we should avoid subshells? Is it really as slow as some claim?
From what I understand, when you use `foo=$(function)`, it's just a regular function call, so it doesn't imply a heavy subshell overhead. Prioritize writing your code for readability. If you notice inefficiencies later, you can always profile it and optimize what truly matters.
Interesting! It seems like for functions, even if there's a fork, it doesn't look too bad overall.
Thanks for the advice! I usually go with your suggestion when I just need the output. I only use a reference when I need to change a value in the dictionary. I prefer consistency in my functions: either they return a status code or print a string for capturing. I'm also trying to figure out the best way to return arrays—I've been using a reference for that, but now I'm torn between that and capturing output with `readarray`.