I'm currently doing exploratory data analysis on a big survey dataset, and I've come across a column labeled 'Age' that contains value categories like 'under 18 years old', '18-24 years old', '24-31 years old', etc. I'd like to calculate the mean age, but since these values are categorized as object types, it's been tricky! I've considered calculating average ages for each category to find the overall average. What suggestions do you have for handling this situation? Thanks in advance for your help!
5 Answers
To make it easier, you could create a list where each age range corresponds to a number. For example, turn 'under 18' into 17, '18-24' into 21, and so on. This way, you'll have a structure that lets you calculate means or medians without much fuss.
Honestly, I wouldn't change those buckets! Keeping them allows for more flexible analysis later on. However, if you really need an average, consider finding the midpoints of each range and using that as a basis for your mean/median calculations.
In cases like 'under 18' years old, it's tricky. You might need to assign a sensible estimate, like 16, for analysis, but be consistent with your approach! Take midpoint values for ranges just to create a fair estimate. Just keep in mind this may introduce some slight inaccuracies!
It sounds like you're dealing with ordinal data with those age brackets. Unfortunately, you can't really compute the average age from those categories directly. However, you can calculate the distribution or the median if needed. Focus on counting how many fall into each bracket for a clearer picture!
You can't really convert 'under 18' to a specific integer—it's a condition, not a number. But for analysis, taking the median value of each range can work well! Just remember, for ranges like '65 and older', you'd have to decide on a reasonable upper limit for your calculations.
Thanks for the tip! That's a good point about using an upper limit for the upper age brackets.

I appreciate the insight! I’ll be sure to apply the same method consistently throughout.