A basic C++ command-line SEO analysis tool focused on demonstrating practical language proficiency and object-oriented programming concepts through real-world inspired HTML parsing and analysis.
- Project Overview
- Background & Objective
- Features
- Sample Websites
- How It Works
- Typical Output
- Challenges & Solutions
- Technical Design & Language Proficiency
- Usage
- Future Work
This project is not intended as a production-ready SEO tool. Instead, it is a showcase of my C++ and OOP skills, designed to simulate the process of SEO analysis on static HTML samples. The analyzer extracts and reports on SEO-relevant information such as page titles, meta descriptions, keyword occurrences, and link counts.
While exploring C++ and object-oriented programming, I sought a hands-on project to bridge theory and practice. I chose to build an SEO analyzer for HTML files, allowing me to apply file I/O, string manipulation, error handling, and OOP design patterns in a relevant context. The project serves as both a demonstration of my technical growth and a portfolio artifact.
- Loads and parses HTML files
- Extracts and reports:
- Title and meta description
- Keyword density (with case-insensitive search)
- List and classification of internal/external links
- Handles varied HTML formatting (spaces, line breaks, cases)
- Demonstrates modular C++ OOP design
To reflect real-world diversity, the repository includes three example sites for analysis:
- E-Commerce Product Page (
ecommerce-product/index.html) - Personal Blog (
personal-blog/index.html) - Portfolio (
portfolio/index.html)
Each site is analyzed for basic SEO structure, and the analyzer’s output is demonstrated on these samples.
-
Build the Analyzer:
g++ seo_analyzer.cpp -o seo_analyzer
-
Run the Program:
./seo_analyzer
-
Enter the path to an HTML file and a keyword when prompted.
The analyzer reads the file, extracts the title and meta description, counts keyword occurrences (case-insensitive), and lists internal/external links.
- Title: "Buy the SuperWidget 3000"
- Meta Description: "Purchase the amazing SuperWidget 3000 with free shipping!"
- Keyword 'store': found 1 time
- Links: 0
- Title: "Jane Doe's Blog"
- Meta Description: "Personal blog sharing tech tips, stories, and tutorials."
- Keyword 'SEO': found 2 times
- Links: 3 (all internal)
- Title: "Sam Smith Portfolio"
- Meta Description: "Sam Smith - Web Developer Portfolio showcasing projects and skills."
- Keyword 'Developer': found 3 times
- Links: 0
This output helps quickly assess SEO basics (title, meta, keyword density, and link structure) for any HTML sample.
During development, I faced several common issues:
- HTML parsing fragility: The code initially failed to find titles and meta descriptions if tags had extra spaces, different case, or line breaks.
- Whitespace and formatting: Extracted values often included unwanted spaces or newlines.
- Keyword matching: Ensuring case-insensitive and accurate keyword counts was essential.
- HTML variability: Real-world HTML is not standardized; parsing logic needed robustness.
Resolution:
To address these, I improved the parsing logic:
- Converted the entire HTML content to lowercase for case-insensitive search.
- Extracted values from the original content to preserve case, trimming whitespace for clean output.
- Enhanced search patterns to handle extra spaces and line breaks.
- Tested thoroughly using diverse HTML examples.
Example of robust parsing logic:
void loadFromFile(const std::string& filepath) {
std::ifstream file(filepath);
std::stringstream buffer;
if (file.is_open()) {
buffer << file.rdbuf();
bodyContent = buffer.str();
}
file.close();
std::string lowerContent = toLower(bodyContent);
// Find <title>
size_t titleStart = lowerContent.find("<title>");
size_t titleEnd = lowerContent.find("</title>");
if (titleStart != std::string::npos && titleEnd != std::string::npos) {
title = bodyContent.substr(titleStart + 7, titleEnd - titleStart - 7);
title.erase(0, title.find_first_not_of(" \n\r\t"));
title.erase(title.find_last_not_of(" \n\r\t") + 1);
} else {
title = "";
}
// Find meta description
size_t metaStart = lowerContent.find("name=\"description\"");
if (metaStart != std::string::npos) {
size_t contentPos = lowerContent.find("content=\"", metaStart);
if (contentPos != std::string::npos) {
contentPos += 9;
size_t metaEnd = lowerContent.find("\"", contentPos);
if (metaEnd != std::string::npos) {
metaDescription = bodyContent.substr(contentPos, metaEnd - contentPos);
metaDescription.erase(0, metaDescription.find_first_not_of(" \n\r\t"));
metaDescription.erase(metaDescription.find_last_not_of(" \n\r\t") + 1);
}
}
} else {
metaDescription = "";
}
}This project demonstrates:
- Encapsulation and modularity: Parsing logic and data representation are separated into dedicated classes.
- Inheritance & polymorphism: Extended analyzer functionality using classic OOP approaches.
- String manipulation and file I/O: Robust handling of reading files and searching/extracting data.
- Error handling: Graceful management of file and parsing exceptions.
- Readable, maintainable code: Structured for easy understanding and extensibility.
-
Build:
g++ seo_analyzer.cpp -o seo_analyzer
-
Run:
./seo_analyzer
-
Follow prompts to analyze any HTML file and keyword.
- Add modules for analyzing header structure, readability, and generating detailed reports.
- Integrate a real HTML parser for more complex documents.
- Expand set of sample websites and analysis features.
SEO--Optimization is a learning project designed to demonstrate my skills in C++ and object-oriented programming. It parses sample HTML files for SEO-relevant data, with robust logic that reflects real-world variability. Suggestions, improvements, and contributions are welcome!
