I have a CSV file that has three fields, but I'm primarily interested in the first field (which contains full names) and the third field (which includes one or more URLs). The URLs are enclosed in double quotes if there's more than one and are separated by a comma and a space.
I need to download the files linked in the third field into a folder, renaming them according to the first field. For example, if the first field contains 'Jane Doe', any downloaded file from the corresponding third field should be named 'jane-doe.png' or 'jane-doe.pdf'.
I'm a bit stuck on how to handle the third field since it can have multiple URLs and how to automate this task efficiently. Any suggestions?
3 Answers
Here's a script you might find helpful:
```bash
while IFS=, read -r name _ urls; do
name=$(echo "$name" | tr -d '"' | xargs)
urls=$(echo "$urls" | tr -d '"')
slug=$(echo "$name" | tr '[:upper:]' '[:lower:]' | tr ' ' '-')
i=1
while IFS= read -r url; do
url=$(echo "$url" | xargs)
ext="${url##*.}"
if [ "$i" -eq 1 ]; then
outfile="${slug}.${ext}"
else
outfile="${slug}-${i}.${ext}"
fi
echo "Downloading: $url -> $outfile"
curl -sL -o "downloads/${outfile}" "$url"
((i++))
done <<< "$(echo "$urls" | tr ',' 'n')"
done < input.csv
```
Just remember to skip any header rows first!
Using a combination of CSV parsing with AWK and execution with xargs is a great approach. You can loop through the data, parse it, and then run the necessary download commands. This way, you can avoid complications with the URL formatting.
You can use AWK's split() function to handle the URLs. For each line, split the third field by commas and create an array. After that, use a single wget command with the -i option to download all URLs from a file. This will be much quicker than downloading each file one at a time.
That's a solid method! Just to clarify, how do you handle cases where URLs are quoted?

Could you provide a brief example of how you would set this up?