Skip to content

Analyze public documentation for readability, structure, completeness, and style using NLP and Selenium. Generates actionable reports and optional rewrites.

License

Notifications You must be signed in to change notification settings

saniyaacharya04/document_analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Documentation Analyzer License: MIT

This tool analyzes public documentation (e.g., MoEngage docs) and provides a structured report evaluating:

  1. Readability
  2. Structure & Flow
  3. Completeness & Examples
  4. Style Guidelines

Optionally, it can also revise the content based on suggestions generated.


Features

  • Evaluate readability using metrics like Flesch Reading Ease and Gunning Fog Index.
  • Assess structural quality: headings, paragraph flow, and list usage.
  • Check completeness: presence of explanations, examples, and step-by-step instructions.
  • Evaluate writing style against the Microsoft Style Guide.
  • Optionally rewrite content to improve clarity, flow, and readability.

Project Structure

document_analyzer/
│
├── analyzer/                 # Core orchestration scripts
│   ├── __init__.py
│   ├── analyzer.py           # Main CLI entry point
│   ├── analyzer_logic.py     # Orchestrates various assessment modules
│   ├── report_generator.py   # Aggregates results into structured output
│   └── revision_agent.py     # Optional agent to revise text
│
├── modules/                  # Individual functional modules
│   ├── __init__.py
│   ├── completeness.py       # Completeness & examples checker
│   ├── readability.py        # Readability metrics calculator
│   ├── structure.py          # Checks structure and flow
│   ├── style.py              # Evaluates style guideline adherence
│   ├── scraper.py            # Scrapes content using Selenium
│   └── utils.py              # Helper functions (e.g., text cleaner)
│
├── reports/                  # Generated or example reports
│   ├── example_report.json
│   └── report.json
│
├── requirements.txt          # Python package dependencies
├── README.md                 # Project documentation
├── LICENSE
└── venv/                     # Virtual environment (optional)

Installation

1. Clone the repository

git clone https://github.com/saniyaacharya04/document_analyzer.git
cd document_analyzer

2. Create and activate a virtual environment (optional but recommended)

python -m venv venv
source venv/bin/activate   # On Windows: venv\Scripts\activate

3. Install dependencies

pip install --upgrade setuptools  # Ensure pkg_resources is available
pip install -r requirements.txt

Usage

Analyze a Documentation URL

python -m analyzer.analyzer "https://help.moengage.com/hc/en-us/articles/4415622460948-Push-Templates" -o reports/report.json

This command will:

  • Fetch the article via Selenium
  • Analyze it across Readability, Structure, Completeness, and Style
  • Save a structured JSON report to reports/report.json

Optional: Use the shell shortcut script

If you created run_analysis.sh:

./run_analysis.sh "https://help.moengage.com/hc/en-us/articles/4415622460948-Push-Templates"

The report will be saved automatically in reports/report.json.


Example Output

{
  "url": "https://help.moengage.com/hc/en-us/articles/4415622460948-Push-Templates",
  "readability": {
    "flesch_reading_ease": 47.43,
    "gunning_fog_index": 9.47,
    "feedback": "Moderate difficulty."
  },
  "structure_and_flow": {
    "assessment": "Add more headings for better navigation. Use bullet or numbered lists to improve readability."
  },
  "completeness_and_examples": {
    "assessment": "No clear examples or step-by-step instructions detected. Consider adding some."
  },
  "style_guidelines": {
    "assessment": "Style is clear, concise, and user-friendly."
  }
}

Optional: Content Rewriting Agent

You can use revision_agent.py to revise documentation with suggested improvements:

python analyzer/revision_agent.py input_article.txt suggestions.json -o reports/revised_output.md

Notes

  • Uses Selenium with headless Chrome for scraping content that blocks requests.
  • Ensure Google Chrome is installed and chromedriver is configured in your PATH.
  • The textstat warning about pkg_resources is harmless and can be ignored.

Style Guide

Style recommendations are based on the Microsoft Writing Style Guide:

  • Clear and concise writing
  • Friendly, conversational tone
  • Step-by-step instructions instead of passive descriptions
  • Consistent use of headings, lists, and examples

License

This project is licensed under the MIT License.

About

Analyze public documentation for readability, structure, completeness, and style using NLP and Selenium. Generates actionable reports and optional rewrites.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages