learner/README.Rmd at main · stmcg/learner · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  out.width = "100%"
)
```

<!-- badges: start -->
[![CRAN status](https://www.r-pkg.org/badges/version/learner)](https://CRAN.R-project.org/package=learner)
[![R-CMD-check](https://github.com/stmcg/learner/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/stmcg/learner/actions/workflows/R-CMD-check.yaml)
[![Codecov test coverage](https://codecov.io/gh/stmcg/learner/graph/badge.svg)](https://app.codecov.io/gh/stmcg/learner)
<!-- badges: end -->

# learner

The `learner` package implements transfer learning methods for low-rank matrix estimation. These methods leverage similarity in the latent row and column spaces between the source and target populations to improve estimation in the target population. The methods include the LatEnt spAce-based tRaNsfer lEaRning (LEARNER) method and the direct projection LEARNER (D-LEARNER) method described by [McGrath et al. (2024)](https://doi.org/10.48550/arXiv.2412.20605).

## Installation

You can install the released version of `learner` from [CRAN](https://CRAN.R-project.org/package=learner) with:
``` r
install.packages("learner")
```

You can install the development version of `learner` from [GitHub](https://github.com/stmcg/learner) with:

``` r
# install.packages("devtools")
devtools::install_github("stmcg/learner")
```

## Example

We illustrate an example of how `learner` can be used. We first load the package.
```{r, results="hide", message=FALSE}
library(learner)
```

In this illustration, we will use one of the toy data sets in the package (`dat_highsim`) that has a high degree of similarity between the latent spaces of the source and target populations. The object `dat_highsim` is a list which contains the observed source population data matrix `Y_source` and the target population data matrix `Y_target`. Since the data was simulated, the true values of the matrices are included in `dat_highsim` as `Theta_source` and `Theta_target`.

#### LEARNER Method

We can apply the LEARNER method via the `learner` function.

This method allows for flexible patterns of heterogeneity across the source and target populations. It consequently requires specifying the tuning parameters `lambda_1` and `lambda_2` which control the degree of transfer learning between the populations. These values can be selected based on cross-validation via the `cv.learner` function. For example, we can specify candidate values of 1, 10, and 100 for `lambda_1` and `lambda_2` and select the optimal values based on cross-validation as follows:
```{r}
res_cv.learner <- cv.learner(Y_source = dat_highsim$Y_source,
                             Y_target = dat_highsim$Y_target,
                             lambda_1_all = c(1, 10, 100),
                             lambda_2_all = c(1, 10, 100),
                             step_size = 0.003)
res_cv.learner$lambda_1_min
res_cv.learner$lambda_2_min
```

Next, we apply the `learner` function with these values of `lambda_1` and `lambda_2`:
```{r}
res_learner <- learner(Y_source = dat_highsim$Y_source,
                       Y_target = dat_highsim$Y_target,
                       lambda_1 = 100, lambda_2 = 1,
                       step_size = 0.003)
```
The LEARNER estimate is given by the `learner_estimate` component in the output of the `learner` function, e.g.,
```{r}
res_learner$learner_estimate[1:5, 1:5]
```
We can check the convergence of the numerical optimization algorithm by plotting the values of the objective function at each iteration:
```{r, out.width="75%"}
plot(res_learner$objective_values,
     xlab = 'Iteration', ylab = 'Objective function',
     type = 'l', col = 'blue', lwd = 2)
```


#### D-LEARNER Method

We can apply the D-LEARNER method via the `dlearner` function. This method makes stronger assumptions on the heterogeneity across the source and target populations. It consequently does not rely on choosing tuning parameters. The `dlearner` function can be applied as follows:
```{r}
res_dlearner <- dlearner(Y_source = dat_highsim$Y_source,
                         Y_target = dat_highsim$Y_target)
res_dlearner$dlearner_estimate[1:5, 1:5]
```