Build advanced movie recommendation system by deepshekhardas · Pull Request #3 · deepshekhardas/-deepshekhardas.github.io

deepshekhardas · 2025-08-27T23:51:00Z

This pull request contains changes generated by Cursor background composer.

…oach Co-authored-by: deepshekhardas1234 <deepshekhardas1234@gmail.com>

cursor · 2025-08-27T23:51:02Z

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
_{Learn more about Cursor Agents}

cubic-dev-ai

5 issues found across 25 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="recommender/hybrid.py">

<violation number="1" location="recommender/hybrid.py:75">
P1: The `by_likes` branch does not filter out liked/disliked movies from candidates, so the user's own selections are likely to appear (often at the top) of the recommendations. The `by_user` branch correctly excludes seen movies — the same pattern should be applied here.</violation>
</file>

<file name="recommender/cf.py">

<violation number="1" location="recommender/cf.py:101">
P1: The sklearn fallback computes RMSE on training data (reconstruction error), while the Surprise path computes it on a held-out test set. These are not comparable: training RMSE is always lower, so the sklearn backend will look artificially better. Split the data the same way (e.g., hold out 20% of observed entries) before computing RMSE, so the metric is meaningful regardless of backend.</violation>
</file>

<file name="recommender/posters.py">

<violation number="1" location="recommender/posters.py:10">
P2: `lru_cache` permanently caches transient HTTP failures. If a request times out or the server returns a 5xx, `None` is cached and the poster will never be fetched for that movie until the process restarts. Consider caching only successful results (e.g., use a manual dict cache that stores only non-`None` values).</violation>
</file>

<file name="recommender/data.py">

<violation number="1" location="recommender/data.py:26">
P1: Calling `zf.extractall(base)` without validating member paths is vulnerable to zip-slip attacks — a crafted archive (from `MOVIELENS_LOCAL_ZIP` or a compromised download) could write files outside `base`. Validate that each member's resolved path starts with the target directory, or on Python 3.12+ use the `filter='data'` parameter.</violation>

<violation number="2" location="recommender/data.py:47">
P1: Falling back to `verify=False` on `SSLError` silently disables TLS certificate verification, making the download vulnerable to man-in-the-middle attacks. A network attacker could serve a malicious zip that then gets extracted to disk. Remove this fallback — if TLS fails, the error should propagate so users can fix their environment (e.g., install `certifi` or update CA certs) rather than silently downgrade security.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-03-08T14:59:35Z

+        cf_scores = _minmax(cf_scores)
+        content_scores = _minmax(content_scores)
+        hybrid = weight_cf * cf_scores + (1.0 - weight_cf) * content_scores
+        candidates = index_to_movie


P1: The by_likes branch does not filter out liked/disliked movies from candidates, so the user's own selections are likely to appear (often at the top) of the recommendations. The by_user branch correctly excludes seen movies — the same pattern should be applied here.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At recommender/hybrid.py, line 75: <comment>The `by_likes` branch does not filter out liked/disliked movies from candidates, so the user's own selections are likely to appear (often at the top) of the recommendations. The `by_user` branch correctly excludes seen movies — the same pattern should be applied here.</comment> <file context> @@ -0,0 +1,84 @@ + cf_scores = _minmax(cf_scores) + content_scores = _minmax(content_scores) + hybrid = weight_cf * cf_scores + (1.0 - weight_cf) * content_scores + candidates = index_to_movie + + # Prepare result DataFrame </file context>

cubic-dev-ai · 2026-03-08T14:59:35Z

+        "item_means": item_means.astype(np.float32),
+    }
+
+    recon = (user_factors @ item_factors.T) + item_means


P1: The sklearn fallback computes RMSE on training data (reconstruction error), while the Surprise path computes it on a held-out test set. These are not comparable: training RMSE is always lower, so the sklearn backend will look artificially better. Split the data the same way (e.g., hold out 20% of observed entries) before computing RMSE, so the metric is meaningful regardless of backend.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At recommender/cf.py, line 101: <comment>The sklearn fallback computes RMSE on training data (reconstruction error), while the Surprise path computes it on a held-out test set. These are not comparable: training RMSE is always lower, so the sklearn backend will look artificially better. Split the data the same way (e.g., hold out 20% of observed entries) before computing RMSE, so the metric is meaningful regardless of backend.</comment> <file context> @@ -0,0 +1,152 @@ + "item_means": item_means.astype(np.float32), + } + + recon = (user_factors @ item_factors.T) + item_means + obs_pred = recon[nonzero] + obs_true = mat[nonzero] </file context>

cubic-dev-ai · 2026-03-08T14:59:35Z

+
+    def _extract(zip_file: Path):
+        with zipfile.ZipFile(zip_file, "r") as zf:
+            zf.extractall(base)


P1: Calling zf.extractall(base) without validating member paths is vulnerable to zip-slip attacks — a crafted archive (from MOVIELENS_LOCAL_ZIP or a compromised download) could write files outside base. Validate that each member's resolved path starts with the target directory, or on Python 3.12+ use the filter='data' parameter.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At recommender/data.py, line 26: <comment>Calling `zf.extractall(base)` without validating member paths is vulnerable to zip-slip attacks — a crafted archive (from `MOVIELENS_LOCAL_ZIP` or a compromised download) could write files outside `base`. Validate that each member's resolved path starts with the target directory, or on Python 3.12+ use the `filter='data'` parameter.</comment> <file context> @@ -0,0 +1,97 @@ + + def _extract(zip_file: Path): + with zipfile.ZipFile(zip_file, "r") as zf: + zf.extractall(base) + + if local_zip and Path(local_zip).exists(): </file context>

cubic-dev-ai · 2026-03-08T14:59:35Z

+            urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
+        except Exception:
+            pass
+        with requests.get(MOVIELENS_URL, stream=True, timeout=60, verify=False) as r:  # type: ignore[arg-type]


P1: Falling back to verify=False on SSLError silently disables TLS certificate verification, making the download vulnerable to man-in-the-middle attacks. A network attacker could serve a malicious zip that then gets extracted to disk. Remove this fallback — if TLS fails, the error should propagate so users can fix their environment (e.g., install certifi or update CA certs) rather than silently downgrade security.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At recommender/data.py, line 47: <comment>Falling back to `verify=False` on `SSLError` silently disables TLS certificate verification, making the download vulnerable to man-in-the-middle attacks. A network attacker could serve a malicious zip that then gets extracted to disk. Remove this fallback — if TLS fails, the error should propagate so users can fix their environment (e.g., install `certifi` or update CA certs) rather than silently downgrade security.</comment> <file context> @@ -0,0 +1,97 @@ + urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning) + except Exception: + pass + with requests.get(MOVIELENS_URL, stream=True, timeout=60, verify=False) as r: # type: ignore[arg-type] + r.raise_for_status() + with open(zip_path, "wb") as f: </file context>

cubic-dev-ai · 2026-03-08T14:59:35Z

+PLACEHOLDER = "https://via.placeholder.com/500x750?text=No+Image"
+
+
+@lru_cache(maxsize=4096)


P2: lru_cache permanently caches transient HTTP failures. If a request times out or the server returns a 5xx, None is cached and the poster will never be fetched for that movie until the process restarts. Consider caching only successful results (e.g., use a manual dict cache that stores only non-None values).

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At recommender/posters.py, line 10: <comment>`lru_cache` permanently caches transient HTTP failures. If a request times out or the server returns a 5xx, `None` is cached and the poster will never be fetched for that movie until the process restarts. Consider caching only successful results (e.g., use a manual dict cache that stores only non-`None` values).</comment> <file context> @@ -0,0 +1,61 @@ +PLACEHOLDER = "https://via.placeholder.com/500x750?text=No+Image" + + +@lru_cache(maxsize=4096) +def _omdb_poster(title: str, year: Optional[int]) -> Optional[str]: + api_key = os.getenv("OMDB_API_KEY") </file context>

Add Streamlit movie recommender with hybrid CF and content-based appr…

839960d

…oach Co-authored-by: deepshekhardas1234 <deepshekhardas1234@gmail.com>

cubic-dev-ai bot reviewed Mar 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build advanced movie recommendation system#3

Build advanced movie recommendation system#3
deepshekhardas wants to merge 1 commit intomainfrom
cursor/build-advanced-movie-recommendation-system-5b5d

deepshekhardas commented Aug 27, 2025

Uh oh!

cursor bot commented Aug 27, 2025

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

cubic-dev-ai bot Mar 8, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai bot Mar 8, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai bot Mar 8, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai bot Mar 8, 2026 •

edited

Loading

Uh oh!

cubic-dev-ai bot Mar 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		PLACEHOLDER = "https://via.placeholder.com/500x750?text=No+Image"


		@lru_cache(maxsize=4096)

Conversation

deepshekhardas commented Aug 27, 2025

Uh oh!

cursor bot commented Aug 27, 2025

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cubic-dev-ai bot Mar 8, 2026 •

edited

Loading

cubic-dev-ai bot Mar 8, 2026 •

edited

Loading

cubic-dev-ai bot Mar 8, 2026 •

edited

Loading

cubic-dev-ai bot Mar 8, 2026 •

edited

Loading

cubic-dev-ai bot Mar 8, 2026 •

edited

Loading