Hey everyone, I've been thinking about the classic chicken and egg problem in machine learning. Traditional ML models often need to recognize what they don't know, but here's the catch: to effectively model uncertainty, you need examples from those uncertain regions. The problem is, those regions are defined by the absence of data, meaning you can't learn anything from what you've never seen. How do we break this cycle of circular reasoning?
I came across a formula: bc_x(r) = [N P(r | accessible)] / [N P(r | accessible) + P(r | inaccessible)]
To break this down for everyone:
- N represents the number of training samples (the certainty budget).
- P(r | accessible) indicates how many training examples I've seen similar to the current scenario.
- P(r | inaccessible) reflects the idea that everything I haven't encountered is equally likely.
In simple terms, confidence can be thought of as the ratio of the evidence I've encountered to the total evidence (including what I haven't witnessed).
For instance, if the input is far from the training data, P(r | accessible) approaches 0, making bc_x(r) = 0, meaning "I know nothing." Conversely, if the input is close to the training examples, P(r | accessible) is large, bringing bc_x(r) close to 1, indicating "I'm sure."
The uniform prior P(r | inaccessible) assumes no training (it's just a constant), whereas the density P(r | accessible) only learns from positive examples. This competition generates an uncertainty boundary.
If you want to see this in action, check out my GitHub for a zero-dependency NumPy demo! You can actually play around with it using the MinimalSTLE model.
2 Answers
Exactly! In ML, you can think of uncertainty as the unknown space where your model hasn't yet learned anything. When you're working with complex datasets, defining those areas without data is crucial. The relationship between what you've trained on and what you haven't creates that uncertainty boundary you're talking about. Super interesting stuff!
Great question! The "ignorance" in your model refers to regions in your input space that your training data doesn't represent. For example, if your model is trained on cat and dog images, everything outside that (like cars or random noises) falls into that ignorance category. So, any data point the model hasn’t seen will lead to uncertainty.

Related Questions
How To: Running Codex CLI on Windows with Azure OpenAI
Set Wordpress Featured Image Using Javascript
How To Fix PHP Random Being The Same
Why no WebP Support with Wordpress
Replace Wordpress Cron With Linux Cron
Customize Yoast Canonical URL Programmatically