Skip to content

requesting help in reproducing your results #20

@alifarhat40

Description

@alifarhat40

Hello,

I came across this work and found it interesting how density estimation is performed over sparse grids.

May I have the code to reproduce this?

Image

Image

From this paper:
https://mediatum.ub.tum.de/doc/1540919/834492.pdf

Figure 3 in this paper also applies sparse grids but does not provide data.
https://cims.nyu.edu/~pehersto/preprints/sgde_siam.pdf

I used to do this:

#Step 1: make the moons dataset
X, y = make_moons(n_samples=n_samples, noise=0.05, random_state=42)

#Step 2: Compute the kernel density estimation
dens_u = sm.nonparametric.KDEMultivariate(data=X,var_type='cc', bw='normal_reference')

# Step 3: Define bins (51 edges = 50 bins)
x_bins = 36
y_bins = 36
x_edges = np.linspace(X[:, 0].min() - 0.5, X[:, 0].max() + 0.5, x_bins + 1)
y_edges = np.linspace(X[:, 1].min() - 0.5, X[:, 1].max() + 0.5, y_bins + 1)

# Step 4: Compute bin centers for KDE evaluation
x_centers = (x_edges[:-1] + x_edges[1:]) / 2
y_centers = (y_edges[:-1] + y_edges[1:]) / 2
xx_centers, yy_centers = np.meshgrid(x_centers, y_centers)
bin_centers = np.vstack([xx_centers.ravel(), yy_centers.ravel()]).T

# Step 5: Evaluate KDE on the grid
density = dens_u.pdf(bin_centers)
density = density.reshape((y_bins, x_bins)) 

# Step 6: Create edge meshgrid for pcolormesh plotting
xx_edges, yy_edges = np.meshgrid(x_edges, y_edges)

# Step 7: Plot the density heatmap
plt.figure(figsize=(8, 6))
mesh = plt.pcolormesh(xx_edges, yy_edges, density, shading='auto', cmap='Blues')
plt.scatter(X[:, 0], X[:, 1], c='black', s=5, alpha=0.5)
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Multivariate kernel density estimator')
plt.colorbar(mesh, label='Estimated Density')
plt.show()

It gives me this image with a 36*36 meshgrid with a density for each meshgrid
Image

May I have the code to perform this using sparse grids instead? My method works for 2D, but higher dimensions is too costly. Imagine if we have 10 dimensions then 36 * 36 * 36 * 36 * 36 * 36 * 36 * 36 * 36 * 36 = 36^10 total meshgrid points to compute the kernel density on.

That way I can understand how to use it in my large dataset. I have data of size 2638 samples by 10 features. If I create a meshgrid by splicing each dimension into 10 equal width binning, then the problem becomes impossible to do kernel density estimation on (10 ^10 total hypercube bins). But with the SG++ sparse grids, I am hoping to reduce that and only obtain densities on relevant grids and their coordinates. For example, grid position [10,5,2,3,2,1,1,2,1,5] = 0.66245 , etc ... Does the sparse grid method provide me with such coordinates?

I need to compute the density over discretized bins, but I need to do it efficiently.

I am unable to figure out how to obtain the sparse meshgrid coordinates [x1,x2,x3,x4,x5,x6,x7,x8,x9,x10] and this location's density. Being sparse is fine, because it can at least know where to compute the density on local neighborhood of hypercubs instead of just one place.

Another naive method was manually binning each dimension in equal length bins, and then reading off how many points share the same bin and calculate discrete density by dividing number of points in that hypercube by total number of data points. But I would like a density estimate over a smooth binning hypercube grid locations.

I need this for a downstream task I am doing. In high dimensions (like 8D), a full meshgrid of bin centers becomes infeasible due to the curse of dimensionality. Can I treat the sparse grid points from SGpp as a structured, adaptive set of hypercube centers?

Any direction or advice on how to obtain sparse grid coordinates with their density value for high dimensional data would be greatly appreciated.

@flo2k @obersteiner @severin617 @vhaller @FHof

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions