1. Reply from Prof. Zhang

This is not a good research idea, because when we average models with measurement error data, how to use measurement error data to make predictions is always easy to be questioned by others (notice that the future data will still have measurement errors). While, The advantage of the model average lies in prediction.

It means that model averaging regression for this kind of data is useless if we cannot predict the data with measurement error given a certain model.

It also implies that model averaging is designed for prediction purposes among regression models. So unsupervised learning tasks may be naturedly difficult to develop a model averaging version.

Befor we continue, what kinds of research about measurement error data have been done should be reviewed.

2. Review

2.1. Ordinary cases

2.1.1. Multi-response models

2.1.2. Measurement error data

Buonaccorsi(1995) discussed the difficult of prediction in the presence of measurement error.

2.2. Model averaging

Recently, model averaging methods were developed for low-dimensional multi-response linear models (Zhu, 2017) and measurement error models (Zhang, 2019), separately.

2.2.1. Multi-response models

2.2.2. Measuerment error data

2.3. High-dim variable selection

In the field of high-dimensional variable selection, some works focus on multi-response error-in-variables model.

Let’s look at some high-dimensional multi-response regression methods first.

2.3.1. Multi-response model

2.3.1.1. One step penalty

Uematsu(2019) introduced a penalized regression for multi-response model which imposed three penalties into the components of SVD of \(\mathbf{C}_0\) simultaneously. \(\begin{align} \begin{aligned} (\widehat{\mathbf{D}}, \widehat{\mathbf{U}}, \widehat{\mathbf{V}})=& \underset{\mathbf{D}, \mathbf{U}, \mathbf{V}}{\arg \min }\left\{\frac{1}{2}\left\|\mathbf{Y}-\mathbf{X} \mathbf{U D V}^{T}\right\|_{F}^{2}+\lambda_{d}\|\mathbf{D}\|_{1}+\lambda_{a} \rho_{a}(\mathbf{U D})+\lambda_{b} \rho_{b}(\mathbf{V D})\right\} \\ & \text { subject to } \quad \mathbf{U}^{T} \mathbf{U}=\mathbf{I}_{m}, \quad \mathbf{V}^{T} \mathbf{V}=\mathbf{I}_{m} \end{aligned} \end{align}\)

Then the resulting estimator is \(\widehat{\mathbf{C}} = \widehat{\mathbf{U}} \widehat{\mathbf{D}} \widehat{\mathbf{V}}^T\).

2.3.1.2. Sequential regressions

Model \(\eqref{eq:model_2}\) can be decomposed into the sum of \(r_0\) unit rank matrices, i.e., \(\mathbf{C}_0=\sum_{r=1}^{r_0}\mathbf{C}_{0,i}\) where \(r_0\) is the rank of \(\mathbf{C}_0\). Then \(\mathbf{C}_0\) can be estimated by recovering \(r_0\) unit-rank matrices \(\mathbf{C}_{0,i}\) by solving high-dimensional unit-response linear regressions. Thus, any breakthrough in the estimation of unit-rank matrix could lead to a method for solving multi-response regression.

See Zheng(2019) and Zheng(2020) for more details.

2.3.1.3. Paralleled estimation

Some debiasing techniques are applied to improve some initial estimation of \(\mathbf{C}_0\) which often performs bad. Also, matrix decomposition is always used in paralleled method.

2.3.2. Error-in-variables model

More recently, a sequential regression method was developed for high-dimensional [multi-response error-in-variables model (Wu, 2020).

3. New research

We can consider combining Zhu(2017) and Zhang(2019), and develop a model averaging method for multi-response error-in-variables regression

3.1. Model

Our description of the model set-up follows Hansen(2007)’s notations. We consider the multi-response regression model,
\(\begin{align} \label{eq:model_1} y_{i j}=\mu_{i j}+e_{i j}=\sum_{k=1}^{\infty} x_{i k} \mathbf{C}_{k j}+e_{i j}, \quad i=1,2, \ldots, n, j=1,2, \ldots, p, \end{align}\) where \(y_{ij}\) is the \(j\)th component of the dependent variable \(\mathbf{y}_{(i)} = (y_{i1}, y_{i2}, \dots , y_{ip})\), which is a \(p \times 1\) random vector; \(x_{ik}\) is the \(k\)th component of the independent variable \(\mathbf{x}_{(i)}\), which is a countably infinite random vector; \(\mathbf{C}_{kj}\) is the unknown coefficient of \(x_{ik}\) corresponding to the \(j\)th dependent variable; and \(e_{ij}\) is the \(j\)th component of the disturbance \(\mathbf{e}_{(i)} = (e_{i1}, e_{i2},\dots, e_{ip})\), which satisfies \(E(\mathbf{e}_{(i)} | \mathbf{x}_{(i)}) = 0\), and \(E(\mathbf{e}_{(i)}\mathbf{e}_{(i)}^T \vert \mathbf{x}_{(i)}) = \Sigma = (σ_{ij})_{p\times p} > 0\), which means \(\Sigma\) is a positive definite matrix. We assume \(p\) is fixed, \(E\mu^2_{ij} < \infty\) and \(\mu_{ij}\) converges in mean square. Equation \(\eqref{eq:model_1}\) can also be written as the matrix form \(\begin{align} \label{eq:model_2} \mathbf{Y} = \mathbf{X} \mathbf{C}_0 + \mathbf{E}, \end{align}\) where \(\mathbf{Y} = \{\mathbf{y}_1,\dots, \mathbf{y}_n \}^T\in \mathbb{R}^{n \times q}\) is a multi-response matrix, \(\mathbf{X} = \{\mathbf{x}_{(1)},\dots \}^⊤ \in \mathbb{R}^{n \times \infty}\) is a fixed design matrix, \(\mathbf{C}_0 \in \mathbb{R}^{\infty \times q}\) is an unknown coefficient matrix, and \(\mathbf{E} = \{\mathbf{e}_1,\dots,\mathbf{e}_n \}^T\in \mathbb{R}^{n \times q}\) is an error matrix with each row vectors \(\mathbf{e}_i\) independent and identically distributed (i.i.d.) as \(\mathcal{N}(\mathbf{0}, \mathbf{\Sigma}_{\mathbf{E}})\).

However, in many real applications, the design matrix we collect can contain unobserved measurement errors. We consider two kinds of measurement errors associated with the design matrix \(\mathbf{X}\) listed as follows.

  1. Additive errors. The observed covariates \(\mathbf{W} = \mathbf{X} + \mathbf{A}\), where the rows of the additive error matrix \(\mathbf{A}=(a_{ij})_{n\times \infty}\) are i.i.d. with mean vector \(0\) and covariance matrix \(\mathbf{\Sigma}_{\mathbf{A}}\).

  2. Multiplicative errors. The observed covariates \(\mathbf{W} = \mathbf{X} \circ \mathbf{M}\), where \(\circ\) denotes the Hadamard product and rows of \(\mathbf{M}=(m_{ij})_{n\times \infty}\), are i.i.d. with mean vector \(\mathbf{\mu}_\mathbf{M}\) and covariance matrix \(\mathbf{\Sigma}_{\mathbf{M}}\).

3.2. Target

  • Estimate the unknown coefficient matrix \(\mathbf{C}_0\);

  • Predict the mean \(\mathbf{X} \mathbf{C}_0\).

3.3. Method

We use \(M\) candidate models to approximate \(\eqref{eq:model_2}\), where \(M\) is allowed to diverge to infinity as \(n\to\infty\). The \(m\)th approximating (or candidate) model can use any \(km\) regressors belonging to \(\mathbf{x}_{(i)}\).
So the \(m\)th approximating model is \(\begin{align} \mathbf{Y}=\boldsymbol{\mu}_{(m)}+\mathbf{E}= \mathbf{X}_{(m)} \mathbf{C}_{(m)}+\mathbf{E}, \quad m=1,2, \ldots, M \end{align}\) When \(q=1\), Zhu(2017) degenerate to a unit-response regression estimate and Zhang(2019) can be used directly.

When \(q>1\), Zhu(2017) solved \(\eqref{eq:model_2}\) by combining \(q\) unit-response regressions by row and turned the original problem into a unit-response regression problem. Specifically, \(\begin{align} \label{eq:transformed_multi_response_regression} \operatorname{Vec}(\mathbf{Y})=\mathbf{D}_{(m)} \operatorname{Vec}\left(\mathbf{C}_{(m)}\right)+\operatorname{Vec}(\mathbf{E}) , \end{align}\) where \(\mathbf{D}_{(m)}=\mathbf{I}_p\otimes \mathbf{X}_{(m)}\in\mathbb{R}^{pn\times pk_m}\) is of full column rank. Then \(\eqref{eq:transformed_multi_response_regression}\) can be solved by LSE \(\begin{align} \widehat{\operatorname{Vec}\left(\mathbf{C}_{(m)}\right)}=\left(D_{(m)}^{\prime} D_{(m)}\right)^{-1} D_{(m)}^{\prime} \operatorname{Vec}(\mathbf{Y})=\left(\begin{array}{c} \left(\mathbf{X}_{(m)}^{\prime} \mathbf{X}_{(m)}\right)^{-1} \mathbf{X}_{(m)}^{\prime} \mathbf{y}_{1} \\ \left(\mathbf{X}_{(m)}^{\prime} \mathbf{X}_{(m)}\right)^{-1} \mathbf{X}_{(m)}^{\prime} \mathbf{y}_{2} \\ \vdots \\ \left(\mathbf{X}_{(m)}^{\prime} \mathbf{X}_{(m)}\right)^{-1} \mathbf{X}_{(m)}^{\prime} \mathbf{y}_{p} \end{array}\right)=\left(\begin{array}{c} \hat{\mathbf{C}}_{1(m)} \\ \hat{\mathbf{C}}_{2(m)} \\ \vdots \\ \hat{\mathbf{C}}_{p(m)} \end{array}\right)=\operatorname{Vec}\left(\hat{\mathbf{C}}_{(m)}\right) . \end{align}\) Then Zhang(2019) can be used directly.

There seems not difficult and is no need for a in-depth research.

3.4. 3.4 Future works

  1. Some work could focus on the model averaging method for multi-response regressions in the following cases:
    • \(q\) is large;
    • \(p\) diverges with the increase of \(n\);
    • interpretation;
    • high-dimensional cases;
    • use different method to estimate/predict multi-response regressions, different from Zhu(2017)’s work.
  2. And then try to impose some measurement error into the model;
  3. Extent model averaging method to factor models or some others.

4. Reference

Hansen BE. Least squares model averaging. Econometrica. 2007;75:1175–1189.

Zhu, Rong, Guohua Zou, and Xinyu Zhang. “Model averaging for multivariate multiple regression models.” Statistics 52.1 (2018): 205-227.

Zheng, Zemin, et al. “Scalable Interpretable Multi-Response Regression via SEED.” Journal of Machine Learning Research 20.107 (2019): 1-34.

Uematsu, Yoshimasa, et al. “SOFAR: large-scale association network learning.” IEEE Transactions on Information Theory 65.8 (2019): 4924-4939.

Zhang, Xinyu, Yanyuan Ma, and Raymond J. Carroll. “MALMEM: model averaging in linear measurement error models.” Journal of the Royal Statistical Society. Series B, Statistical methodology 81.4 (2019): 763.

Zheng, Zemin, et al. “Sequential scaled sparse factor regression.” Journal of Business & Economic Statistics (2020): 1-10.

Wu, Jie, et al. “Scalable interpretable learning for multi-response error-in-variables regression.” Journal of Multivariate Analysis 179 (2020): 104644.

Hansen BE. Least squares model averaging. Econometrica. 2007;75:1175–1189.