I'm in the middle of creating an ETL process to extract notes from Evernote ENML and I'm weighing my options between using Beautiful Soup or the standard library's xml.etree.ElementTree. I've heard that Beautiful Soup is easier to use, but I've also read that the standard library tends to be faster. This is making me lean toward the standard library approach. Is there any compelling reason I should choose Beautiful Soup instead?
3 Answers
If you’re mainly dealing with XML, you might be better off with xml.etree.ElementTree for performance and efficiency. Beautiful Soup is more aimed at messy HTML parsing; if you're focused on speed, go for the standard library! But if you ever need to parse HTML, remember that libraries like lxml can mess up real-world HTML parsing, so choose wisely!
Yeah, I noticed that too! It seems like the standard library will handle my cases pretty well.
Honestly, the standard xml library is pretty decent and has solid filtering capabilities. Sure, it's not typed, but for your ETL needs, it might be just right. I feel like Beautiful Soup is more suited for web scraping and testing front ends, so it could be overkill for just handling ENML.
I've had good experiences with xml.etree.ElementTree. It's worked well for a variety of XML data sources in the past, so if you're sticking to structured XML like ENML, you should be good to go!

Thanks for the advice! Since performance matters and ENML is a variant of XML, sticking to xml.etree sounds smart.