I'm facing a challenge where I need to store data related to coupons and items. The relationship is quite one-sided. Here's the situation: a coupon can include multiple items, while an item can feature on several coupons simultaneously. We currently handle this manually, and I'm looking for a more efficient solution without breaking the bank.
To illustrate with an example:
- An item may be listed on up to a million coupons, and we're working with roughly 30 coupons at any point, covering around 2 to 10 million items.
- The most critical requirement is to accurately mark items as 'on sale' if they feature on any coupon, and unmark them when they're not. This must be reliable.
- I want to be able to upload new coupon files along with their associated items.
- Once a coupon is canceled, I need to remove it and its items from the sale status.
- Additionally, there's a frequently changing value called AnnoyingValueThatChanges that I need to track for each item, which adds complexity to writes.
- I've projected storage needs of around 20GB if we scale up.
The timing isn't crucial; only the developers will interact with this data. I'm considering various database options:
1. For a relational database, I was thinking of structuring it like:
- Coupon:
- Id
- Coupon_Item:
- CouponId
- ItemId
- Item:
- Id
- AnnoyingValueThatChanges
- OnSale
2. On the NoSQL side, a potential model could be:
- Coupons:
- Id:
- RelatedItemIds: [1 to 1 million]
- Items:
- Id:
- AnnoyingValueThatChanges
- OnSale
- RelatedCouponIds: [1-10]
I've been leaning towards Aurora Serverless for the relational approach as it seems cost-effective, and DynamoDB for NoSQL also looks appealing. What other cheap DB options does AWS offer, and do you see potential flaws in my models?
1 Answer
It sounds like you might be overthinking which database engine to use! Considering how important data accuracy is for your project, I’d suggest sticking to a SQL solution. Your proposed structure is solid. You can manage the state of whether an item is 'on sale' with a sub-query in your item's table instead of trying to maintain a separate state just for that. This could simplify your setup quite a bit. Plus, with the small number of coupons, proper indexing should handle your queries efficiently.
Got it! Just to clarify, in my real-life scenario, I plan on removing items completely when they aren't tied to any coupons anymore. The boolean state for 'on sale' actually comes from another application (like Salesforce) that doesn't allow for deletion as long as it's marked true. This system is meant to keep our data in sync with that. I understand SQL is the better route; my concern is just making sure I choose the right AWS services to keep costs down. That's what made me consider NoSQL in the first place.