# Covariate Shift
Covariate shift refers to the situation where the distribution of $X_{\text {test }}$ changes from $\mathcal{P}$ to $\mathcal{P}_{\text {test }}$, but the relationship between $X_{\text {test }}$ and $Y_{\text {test }}$, i.e. the distribution of $Y_{\text {test }} \mid X_{\text {test }}$, stays fixed.
The key insight here is that while the marginal distribution P(X) shifts, the fundamental relationships between features and outcomes P(Y|X) remain stable across domains.
## Examples
Examples where the input distribution changes but the conditional relationship Y|X remains the same:
**Medical Diagnosis**
• **X**: Patient features (age, symptoms, test results, demographics)
• **Y**: Diagnosis outcome (heart attack, no heart attack)
• **Training**: Model trained on patients from urban hospitals (younger, more diverse population)
• **Test**: Applied to rural hospitals (older, less diverse population)
• **X changes**: Age distribution shifts older, different ethnic composition
• **P(Y|X) stays same**: A 65-year-old with chest pain and elevated troponin has the same probability of heart attack regardless of hospital location
**Spam Detection**
• **X**: Email features (words, sender, links, formatting, urgency indicators)
• **Y**: Email classification (spam, not spam)
• **Training**: Emails from 2020 mentioning "COVID vaccines" and "work from home"
• **Test**: Emails from 2025 mentioning "AI tools" and "remote collaboration"
• **X changes**: Vocabulary and topics shift with current events
• **P(Y|X) stays same**: Urgent language + suspicious links + requests for personal info still indicate spam with the same probability
**House Price Prediction**
• **X**: Property features (size, location, bedrooms, school district, neighborhood type)
• **Y**: House price
• **Training**: Suburban houses in California
• **Test**: Urban apartments in New York
• **X changes**: Property types, neighborhood characteristics, size distributions
• **P(Y|X) stays same**: A 2-bedroom, 1000 sq ft property with good schools nearby has a predictable price relationship regardless of location