This project is a tool to extract information about books and series from websites such as Amazon. The following steps will guide you through setting up and using the tool.
-
Install Python
Make sure Python (version 3.10 or newer) is installed on your system. You can download it from https://www.python.org/downloads/. Always ensure you are using the latest version of Python for compatibility and performance improvements.
-
Set Up Virtual Environment
Create a virtual environment to manage project dependencies. Run the following commands in your terminal:
python -m venv venv
-
Activate Virtual Environment
- On Windows:
venv\Scripts\activate
- On macOS/Linux:
source venv/bin/activate
- On Windows:
-
Install Dependencies
Double-click
setup.bator run the following command to install all the necessary Python libraries fromrequirements.txt:pip install -r requirements.txt
-
Configure the Base URL
Update the
config.jsonfile with the base URL of the target site. For example:{ "baseurl": "http://www.amazon.co.jp/" } -
Launch the Application
Run
launch.batto start the application. Make sure your virtual environment is activated before launching. This will open a GUI window:
-
The GUI allows you to search for books using their name, a direct book link, or a series link.
-
Once a search is performed, the information is displayed in the GUI:
-
Extract and Save HTML Output
After extraction, the data is saved in HTML format in the
/outputdirectory. The output format is displayed below:
- Implement an auto-correction feature (using a language model or AI crawler) to prevent issues when the website source changes.
- Add functionality to grab books that belong to a series but do not have a dedicated series page.
- Enable extraction of R18 novels (this will require a Japanese IP and bypassing age verification).


