# Covariate Shift Covariate shift refers to the situation where the distribution of $X_{\text {test }}$ changes from $\mathcal{P}$ to $\mathcal{P}_{\text {test }}$, but the relationship between $X_{\text {test }}$ and $Y_{\text {test }}$, i.e. the distribution of $Y_{\text {test }} \mid X_{\text {test }}$, stays fixed. The key insight here is that while the marginal distribution P(X) shifts, the fundamental relationships between features and outcomes P(Y|X) remain stable across domains. ## Examples Examples where the input distribution changes but the conditional relationship Y|X remains the same: **Medical Diagnosis** • **X**: Patient features (age, symptoms, test results, demographics) • **Y**: Diagnosis outcome (heart attack, no heart attack) • **Training**: Model trained on patients from urban hospitals (younger, more diverse population) • **Test**: Applied to rural hospitals (older, less diverse population) • **X changes**: Age distribution shifts older, different ethnic composition • **P(Y|X) stays same**: A 65-year-old with chest pain and elevated troponin has the same probability of heart attack regardless of hospital location **Spam Detection** • **X**: Email features (words, sender, links, formatting, urgency indicators) • **Y**: Email classification (spam, not spam) • **Training**: Emails from 2020 mentioning "COVID vaccines" and "work from home" • **Test**: Emails from 2025 mentioning "AI tools" and "remote collaboration" • **X changes**: Vocabulary and topics shift with current events • **P(Y|X) stays same**: Urgent language + suspicious links + requests for personal info still indicate spam with the same probability **House Price Prediction** • **X**: Property features (size, location, bedrooms, school district, neighborhood type) • **Y**: House price • **Training**: Suburban houses in California • **Test**: Urban apartments in New York • **X changes**: Property types, neighborhood characteristics, size distributions • **P(Y|X) stays same**: A 2-bedroom, 1000 sq ft property with good schools nearby has a predictable price relationship regardless of location