티스토리 뷰
[Reels] The simple theoretical background of Domain generalization
BayesianBacteria 2024. 4. 30. 00:46Here I introduce the generalization error bound of the Domain generalization problem, which is the test domain—or style, sometimes—differs from the training domain.

Preliminaries
Notations
- X∈X⊂Rd,Y∈Y⊂R : Common input and target space
- PiXY: Data distribution of the i'th domain
- Si∼PiXY: Samples for the i'th domain
- ϵi: Risk of the i'th domain
(Definition) Domain generalization
Assume that:
- Given M training (source) domains Strain={Si|i=1,…,M}
- The joint distribution between each pair of domains are different: PiXY≠PjXY,1≤i≠j≤M
Then our goal is to learn a robust and generalizable function h:X→Y such that
minhE(x,y)∈Stest[l(h(x),y)]
where l is a loss function and Stest is a target (unseen) domain.
(Definition) H-divergence
The role of H-divergence is to measure the how much different between two domains. The H-divergence between domains PSX and PtX is defined as following:
dHΔH(PsX,PtX):=suph,h′∈H|ϵs(h,h′)−ϵt(h,h′)|
where
ϵs(h,h′)=Ex∼PsX[h(x)≠h′(x)]=Ex∼PsX[|h(x)−h′(x)|]
Here, h,h′ are any classifiers in H, not typical trained-estimator with respect to empirical risk minimization.
However, directly accessing to the distribution of domain i,Pi, is intractable, we can use the empiricalH-divergence using unlabeled data samples of the domain i,Ui. Thus, the empiricalH-divergence between domains s and t is following:
ˆdHΔH(Us,Ut):=suph,h′∈H|ϵUs(h,h′)−ϵUt(h,h′)|
where ϵUs is the empirical risk with the dataset Us.
Domain generalization error bound

(Assumption) The target domain is in a convex hull of source domain distributions.
Let ΔM is the (M−1) dimensional simplex, then we assume that our target domain, PtX, is within the convex hull, Λ:={∑Mi=1πiPiX|π∈ΔM}.
With the above assumption, following theorem holds.
(Theorem) Domain generalization error bound.
Let γ:=minπ∈ΔMdH(PtX,∑Mi=1π∗iPiX) with minimizer π∗ be the distance of PtX from the convex hull Λ, and PX:=∑Mi=1π∗iPiX be the best approximated within Λ. Let ρ:=supP′X,P″ be the diameter of \Lambda. Then it holds that:
\epsilon^t (h) \leq \sum^M_{i=1} \pi^*_i \epsilon^i(h) + \frac{\gamma + \rho}{2} + \lambda{\mathcal{H},(P^t_X,P^*_X)}
where \lambda_{\mathcal{H},(P^t_X,P^s_X)} is the ideal joint risk across the target domain and the domain with the best approximator distribution P^*_X.
\lambda_{\mathcal{H},(P^t_X,P^X)} = \inf_{h \in \mathcal{H}}\mathbb{E}_{(\mathbf{x},y)\sim P^X} [l(h(\mathbf{x}),y)] + \mathbb{E}_{(\mathbf{x},y)\sim P^t_X} [l(h(\mathbf{x}),y)]
What this theorem states?
The above theorem gives us insights of several ways to reduce domain generalization error.
- Reducing each risk of domains can reduce the bound (\epsilon^i).
- If the target domain ,P^t, can be easily approximated by the mixture of the source domains, the bound is tigthened. (\gamma)
- Reducing the distance between the source domains can reduce bound. (\rho)
(1) motivates us to learn well-generalized models for each domain; (2) and (3) motivate us establishing domain-invariant representation.
Note that \epsilon^i is not a empirical risk of the domain i.
Limitation of the theorem
(Personal opinion) I do not think that the within-convex hull assumption is reasonable. If we have a domain that is far enough from the target domains, is it possible that the target domain P^t could be outside of the convex hull?
However, if we consider that the outside of the convex hull as "completely different task" for the source domains, it may be a reasonable assumption. For instance, if we consider an animal-image classification task, it makes sense that the target domain is cartoon images of dogs. But the cartoon images of refrigerators, are not even our objective of the model, the animal-image classification. Generously speaking, this can be an example of the "outside of the convex hull", which implies a completely different task.
'Showing off studying ML > ML - academic reels' 카테고리의 다른 글
[Reels] LCM-Lookahead for Encoder-based Text-to-Image Personalization (2) | 2024.10.15 |
---|---|
[Reels] Imagen yourself (0) | 2024.10.15 |
[Reels] Battle of the Backbones: A large-Scale Comparison of pre-trained models across computer vision tasks (1) | 2024.08.20 |
[Reels] HyDE (Hypothetical Document Embedding) (0) | 2024.05.02 |
- Total
- Today
- Yesterday
- LLM
- diffusion
- ICML
- Theme
- 이문설농탕
- generativemodeling
- icml2024
- DeepLearning
- MachineLearning
- finetuning
- vscode
- Transformer
- loss-weighting
- domaingeneralization
- deeplearning4science
- flowmatching
- multiheadattention
- 몽중식
- 프렌밀리
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | ||
6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 |
27 | 28 | 29 | 30 |