Skip to content

Missing Data Handler

Mohsen edited this page Dec 8, 2022 · 8 revisions

Missing Data Handler

AllInOne has the ability to check the missing pattern in your dataset and impute the missing data using various imputation methods.

Check missing pattern

Untitled -- Missing values in each trait Untitled -- Missing values percentage and pattern (1)

The graph on the upper side has the observation number on X-axis and all dependent/response variables on Y-Axis.
The graph on the lower side has all dependant/response variables on the X-axis and the percentage of missing data on the Y-axis. the Dark blue shows the percentage of existing data and the yellow shows the percentage of missing data points in the dataset.

You can also have access to the descriptive statistic of each dependent/response variable:

image

  • Label: Variable name
  • n: Number of available/existing data points
  • missing_n: Number of missing data points
  • missing_percent: Percentage of missing data points over the total n
  • mean: Mean of the variable
  • sd: Standard deviation of the variable
  • min: Minimum value in the variable
  • quartile_25: Bottom Quartile
  • median: Median of the variable
  • quartile_75: Third quartile
  • max: Maximum value in the variable

Impute Missing Value

AllInOne uses the power of the mice package to impute the missing data points using the following imputation methods:

impute1

In order to select the appropriate method under Select the imputation method.
The number of imputations can be specified under Specify the number of imputation option.
The number of iterations in each imputation can be specified under Specify the number of iteration option.
In order to get the same results every time, you can set a seed number under Specify the seed number option.
You can let methods to remove/ not remove any collinearity effects among your response/dependent variables.

Results

Box and whisker plot:

In a box and whisker plot: The left and right sides of the box are the lower and upper quartiles. The box covers the interquartile interval, where 50% of the data is found. Here, the blue box plot shows the observed data, and the red plot showes the imputation distribution of the predicted data points. The X-axis is the number of imputations and Y-axis is the response/dependent variable.

Untitled -- Box-and-whisker plots

Density plot based on the number of multiple imputations:

Here, the blue plot shows the observed data, and the red plot shows the density of the predicted data points. The X-axis is the number of imputations and Y-axis is the response/dependent variable. the X-axis is the response/dependent variable and the y-axis shows the probability density function for the kernel density estimation.

Untitled -- Density plot based on the number of multiple imputations

Mean of the imputed variables based on the number of multiple imputations:

Here, plots on the left side are the mean, and the right plots are the standard deviation (SD) of the response/dependent variables. Each color represents the number of imputations, and Y-axis is the response/dependent variable. Basically, the mean of the predicted data points, along with its SD value, can be seen in each imputation set.

Untitled -- Mean of the imputed variables based on the number of multiple imputations

Strip plot based on the number of multiple imputations:

Here, the X-axis is the number of imputations, and Y-axis is the response/dependent variable. the blue dots show the observed data, and the red dots show the predicted data points.

Untitled -- Strip plot based on the number of multiple imputations

AllInOne also produces the missing imputation dataset and you have the option to substitute the dataset with the previous one.

Home

Data Visualization

Clone this wiki locally