Skip to content

Conversation

@zaeema-n
Copy link
Collaborator

This pr:

  • Converts the 2020 hierarchical file structure into a flat one and creates a yaml file to store the hierarchies.
  • Adds code to traverse the file structure and correctly select the minister and department from the database

Note that this pr does not do any insertion of categories or datasets yet

@zaeema-n zaeema-n requested a review from ChanukaUOJ January 20, 2026 11:46
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @zaeema-n, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly overhauls the data management system by flattening the 2020 data file structure and centralizing its logical organization within a YAML manifest. It introduces a robust ingestion framework capable of parsing this new manifest, resolving governmental entities against a database, and laying the groundwork for automated data insertion. This foundational change aims to improve data accessibility and streamline future data processing workflows.

Highlights

  • Data Structure Refactoring: The existing hierarchical file structure for 2020 data has been converted into a flat structure, with a new YAML manifest file (data/2020_flat/manifest_2020.yaml) introduced to define the logical hierarchies.
  • New Ingestion Logic: New Python scripts (ingest_flat_yaml.py, src/services/yaml_parser.py, src/services/entity_resolver.py, etc.) have been added to parse the YAML manifest, traverse the defined data structure, and correctly identify minister and department entities from the database.
  • API Interaction Services: Dedicated services (src/services/read_service.py, src/services/ingestion_service.py) and models (src/models/schema.py) have been implemented to interact with the OpenGIN API for reading and potentially inserting entities and relationships.
  • Dependency Updates: New dependencies (python-dotenv, aiohttp, pyyaml) have been added to environment.yml and requirements.txt to support the new functionality.
  • File Management: Old hierarchical data files have been removed, and numerous new data and metadata JSON files have been added to the data/2020_flat/datasets/ directory, reflecting the flattened structure.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant refactoring by moving from a hierarchical file structure to a flat one, managed by a new YAML manifest. It also adds the corresponding Python scripts to parse this new structure and resolve entities against the database. The new code is well-structured, using async practices and separating concerns into different services. However, I've identified a critical resource leak related to an unclosed aiohttp.ClientSession and a high-severity performance issue in the entity resolution logic that could lead to an excessive number of API calls. I've also included several medium-severity suggestions to improve code clarity, maintainability, and project structure. Addressing these points will greatly improve the robustness and quality of the new ingestion process.

Copy link
Contributor

@ChanukaUOJ ChanukaUOJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added Some Comments! Looks good for me!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this separate environment.yml file exists?

@zaeema-n
Copy link
Collaborator Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request flattens the 2020 hierarchical file structure into a flat one and creates a YAML file to store the hierarchies. It also adds code to traverse the file structure and correctly select the minister and department from the database. The changes include modifications to .gitignore, removal of a JSON file, addition of several new JSON files, renaming of some JSON files, addition of a YAML file (manifest_2020.yaml), modification of environment.yml, and addition of several Python files related to ingestion and exception handling. The review focuses on identifying potential issues related to code correctness and maintainability, particularly in the newly added Python files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants