I'm looking for advice on how to store data regarding coupons and the items associated with them. The challenge is that we have one-sided relationships where an item can appear on multiple coupons, and each coupon can include anywhere from 1 to a million items. Currently, this process is managed manually, but we're trying to automate it, ideally at the lowest cost.
Here's a breakdown of my requirements:
- There are about 30 coupons at any time and potentially 2-10 million items.
- I need to be able to mark an item as 'on sale' if it's on any coupon and unmark it when it no longer appears on any. This must be accurate.
- I want to upload files of new coupons and their items.
- I need a way to cancel a coupon, removing its items and marking those as 'not on sale'.
- There's a frequently changing value on items called 'AnnoyingValueThatChanges' that I need to consider for writes.
- The estimated data size could reach 20GB if we expand significantly.
I've drafted a relational database model but I'm also considering NoSQL. My relational structure looks like this:
- Coupon table (with Id)
- Junction table (with CouponId and ItemId)
- Item table (with Id, AnnoyingValueThatChanges, OnSale)
Alternatively, for NoSQL, I would have:
- Coupons (Id with RelatedItemIds)
- Item (Id with AnnoyingValueThatChanges, OnSale, RelatedCouponIds)
DynamoDB on-demand capacity seems like a potential fit. I'd love to hear other cheap AWS options or feedback on my models!
1 Answer
It sounds like you might be overthinking your choice of database! Given that you're more focused on accuracy than high performance or availability, a SQL solution should work just fine. You can structure it like this:
- `coupons` table with fields like `id` and any other relevant metadata.
- `coupon_items` table linking `coupon_id` to `item_id`.
- `items` table with `id` and any other necessary fields.
By querying the `items`, you can easily determine if an item is on sale based on whether it appears in the `coupon_items` table. No need for additional status management. Proper indexing should help you get good performance given your constraints.

Just a heads up, in my actual plan, I’ll remove the 'item' entirely when there are no coupons linked. Also, I need to keep track of the 'on sale' boolean value since it’s linked to Salesforce, and I can't directly fetch from the database at delete time. I'm leaning towards SQL, but I want to ensure I keep costs down on AWS products. That's the main reason I was even considering NoSQL.