I'm looking to extract pairs of `title` and `file` attributes from XML files structured in a specific way. Each `title` appears first, followed by its corresponding `file` attribute afterwards. For example:
```
title="*"
file="*"
```
The XPath for the `title` is `/MediaContainer/Video/@title` and for the `file` it is `/MediaContainer/Video/Media/Part/@file`. I've tried using a command that gets only one of them:
```
find . -iname '*.xml' -print0 | xargs -0 -r grep -ro '
But I'm stuck trying to figure out how to obtain both attributes together. Any guidance on how to achieve this would be greatly appreciated!
3 Answers
I recommend trying `hxselect` or `xmlstarlet`, as they are designed for handling XML parsing more reliably than using grep and awk. Grep can mess up due to the structure of XML, whereas these tools are built for it. You should find them much easier to use and much more reliable!
If you're comfortable with some scripting, you could also explore using Python with libraries like `xml.etree.ElementTree` for parsing XML files. This way, you can extract both the `title` and `file` attributes programmatically. It might seem more complex at first, but it gives you great control and flexibility!
You should consider using an XML-aware tool like xmlstarlet for this. It provides a `select` command that lets you do XPath queries directly from the shell, which would be perfect for your requirement. Here’s a command that should help you get the title and file in one go:
```
find . -iname '*.xml' -print0 | xargs -0 xmlstarlet sel -T -t
-m /MediaContainer/Video -v @title -o 't'
-m /MediaContainer/Media/Part -v @file -o 'n'
```
This will display each `title` with its corresponding `file` on the same line, separated by a tab! Give it a try!
Thanks for the suggestion! I’ll look into using Python for this if the command line tools don’t work out.