How to Extract Features from E-commerce Product Titles?

0
12
Asked By CuriousCoder99 On

Hey everyone! I'm diving into building a product classifier for e-commerce listings and could use some advice on extracting specific features from product titles. For instance, I'm trying to find a way to identify the number of doors in wardrobes from titles like: 'BRAND X Kayden Engineered Wood 3 Door Wardrobe for Clothes' and 'BRAND X Kayden Engineered Wood 5 Door Wardrobe for Clothes'. I want to differentiate between these products based on the number of doors. I'm considering different methods like regex for extraction, tokenization with attention models, fine-tuning a small transformer model, or using dependency parsing to link numerals with the correct features. Has anyone dealt with something similar? I'd really appreciate your thoughts on what worked for you, whether you recommend a rule-based, ML-based, or a mixture of both approaches, and how you tackle other attributes like material and color. Thanks a lot!

2 Answers

Answered By DataDynamo21 On

First off, what's the scale of this project? If it’s a personal venture, I’d suggest trying out a small LLM that doesn't require fine-tuning. You could realistically use something like 4.1-nano for quick results without breaking the bank. It could yield good outcomes without a ton of overhead. If you're looking to push for reliability, I’d explore whether clustering similar products before classification helps streamline things. It sounds like you’re on the right track with utilizing clustering to organize similar items before passing them onto your model. Could be a smart way to reduce error rates!

CuriousCoder99 -

Thanks for the insights! The feedback about using clustering makes sense for improving accuracy. I'm keen to see how it performs as I scale up the number of titles.

Answered By TechieTribe On

If you want a robust solution, I’d recommend a hybrid approach that combines regex with machine learning. Regex can quickly grab straightforward attributes like '3 Door' or '5 Door.' For other attributes, like materials, ML might be better for understanding context. You could always start with simple rule-based logic and then enhance it as you see what works with your data.

CuriousCoder99 -

That sounds like a solid plan! Starting simple and iterating based on results could really help optimize the output as I go along.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.