I'm really curious about OpenAI's approach to A/B testing in Temporary Chat mode. The policy states that these chats won't be used for training the models, but it seems like they still collect feedback on different response versions. If they're not saving or analyzing the chats for future models, why go through the process of A/B testing in the first place?
6 Answers
Keep in mind, the A/B testing is primarily about versions, not the specific answers each model provides. They're refining the overall experience based on user feedback.
It's a bit unclear how they're using the data, but it seems like they're just gathering stats on user preferences for different model versions. This might not require saving chat histories. Essentially, they could be testing which model is preferred without actually training future models with that data.
Exactly! I doubt they're sharing the specific inputs or outputs; it’s likely just a comparison of which model worked better.
Honestly, it feels a bit deceptive. If they're not using it for training, what exactly are they doing with the data? Isn’t a temporary chat just not saved for the user, but they can still keep a copy for up to 30 days for safety?
A/B testing in those chats probably helps them improve features and the user experience before a wider rollout. It's like getting real-time feedback without making permanent changes, making sure updates actually enhance the overall experience.
The phrase "not used to train models" is really vague. While users might think it means their chats aren't analyzed, it could actually mean the data just isn't part of the specific training set. They might use it for other analyses related to user satisfaction instead.
When you provide feedback through A/B tests or use the thumbs-up/down, that seems to overwrite any data settings you might have. So even if a chat is temporary, there might be data logged as a feedback instance that OpenAI uses for statistical analysis.
That's a really good point! It’s tricky since we never know which model we're interacting with; it feels like they switch based on the prompt.