Skip to content

Java application for integrating and processing data from Wikipedia and DB-City. Extracts, normalizes, and stores country information in XML validated by DTD/XSD, supporting XPath/XQuery queries, XSLT transformations, and a graphical interface.

Notifications You must be signed in to change notification settings

RitaP03/Data-Integration-with-XML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧩 Data Integration with XML

Java application developed for data extraction, transformation, and integration from heterogeneous web sources (Wikipedia and DB-City).
The system consolidates country information into a structured XML document, ensuring data validation, querying, and transformation through modern XML technologies.


πŸš€ Setup / Execution

  1. Open the project folder DataIntegration_Project/ in Eclipse, IntelliJ IDEA, or another Java IDE.
  2. Ensure Java 11 (or higher) and the required XML libraries (JAXP, XQuery processor) are configured.
  3. Run the main Java class to start the data integration process.
  4. The program will extract data from Wikipedia and DB-City, process it, and generate a validated XML file.
  5. You can then use the provided XSLT and XQuery scripts to transform and query the XML data.
  6. After these steps, the system will produce a structured XML dataset ready for analysis or visualization.

πŸ—Ί Project Diagram

Below is a simplified view of the Data Integration workflow, representing how information flows from external sources through the integration and transformation stages:

[Wikipedia]        [DB-City]
      ↓                  ↓
   Data Extraction & Parsing (Java)
              ↓
       XML Normalization
              ↓
     Validation (DTD / XSD)
              ↓
   Querying (XPath / XQuery)
              ↓
     Transformation (XSLT)
              ↓
  Output: Integrated XML / HTML / Text

(If the image doesn’t load, check the PDF version in Data_Integration_Report/DataIntegration_Diagram.pdf.)


πŸ“ Project Structure

  • DataIntegration_Project/ β†’ Source code and XML processing logic
  • Data_Integration_Report/ β†’ Documentation and project report
  • README.md β†’ Project overview and technical description

βš™οΈ Main Features

  • Data integration from external web sources (Wikipedia, DB-City)
  • Data normalization and transformation into unified XML format
  • DTD and XSD validation to ensure data consistency
  • XPath and XQuery for structured queries
  • XSLT transformations to convert XML into HTML and text formats
  • Graphical user interface (GUI) for managing and visualizing data

🧩 Technologies Used

  • Java (data handling, XML parsing, and GUI implementation)
  • XML / DTD / XSD (data structure and validation)
  • XPath / XQuery / XSLT (data querying and transformation)

🧠 Learning Outcomes

  • Experience with structured data representation (XML)
  • Implementation of data validation and transformation pipelines
  • Application of query languages (XPath, XQuery) for information retrieval
  • Development of a user-friendly data management interface

πŸ“š About

Developed for the Data Integration and Information Systems course at the Instituto Superior de Engenharia de Coimbra (ISEC).
The project focuses on practical application of data extraction, integration, and transformation techniques, simulating a real-world data processing workflow.


About

Java application for integrating and processing data from Wikipedia and DB-City. Extracts, normalizes, and stores country information in XML validated by DTD/XSD, supporting XPath/XQuery queries, XSLT transformations, and a graphical interface.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published