Skip to content

A Python utility for cloning GitHub repositories and generating a JSON-friendly representation of their directory structure, with support for .gitignore rules, customizable exclude patterns, and private repositories. Ideal for developers looking to analyze and visualize repository layouts efficiently.

License

Notifications You must be signed in to change notification settings

OguzhanCetinkaya/github-repo-structure

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GitHub Repository Structure Analyzer

A Python utility for cloning GitHub repositories and generating a JSON-friendly representation of their directory structure. This tool provides a clean way to analyze repository layouts while respecting .gitignore rules and offering customizable exclude patterns for different project types.

Features

  • Clone GitHub repositories with progress visualization
  • Generate nested JSON representation of repository structure
  • Support for private repositories via GitHub tokens
  • Respect .gitignore patterns
  • Configurable depth limit for directory traversal
  • Pre-defined exclude patterns for common project types (Python, Node.js, Java)
  • Progress bar visualization during cloning using alive-progress
  • Handles symlinks and special files appropriately

Installation

Prerequisites

Make sure you have Python 3.6+ installed. Then install the required dependencies:

pip install -r requirements.txt

Usage

Basic Usage

from repo_structure import get_repo_structure

# Simple example with default settings
structure = get_repo_structure(
    github_url="https://github.com/username/repo.git",
    clone_path="local_repo_folder"
)

Advanced Usage

from repo_structure import get_repo_structure, PROJECT_EXCLUDES
import json

# Using project-specific excludes and depth limit
structure = get_repo_structure(
    github_url="https://github.com/username/repo.git",
    clone_path="local_repo_folder",
    token="your_github_token",  # For private repos
    max_depth=2,  # Limit directory traversal depth
    exclude_patterns=PROJECT_EXCLUDES["python"]  # Use Python-specific excludes
)

# Print the structure as formatted JSON
print(json.dumps(structure, indent=2))

Private Repositories

For private repositories, you'll need to provide a GitHub personal access token:

structure = get_repo_structure(
    github_url="https://github.com/oguzhancetinkaya/private-repo.git",
    clone_path="private_repo_folder",
    token="your_github_personal_access_token"
)

Exclude Patterns

The tool comes with predefined exclude patterns for different project types:

  • DEFAULT_EXCLUDES: Basic patterns like .git, node_modules, venv
  • PYTHON_EXCLUDES: Python-specific patterns
  • NODE_EXCLUDES: Node.js-specific patterns
  • JAVA_EXCLUDES: Java-specific patterns

You can also provide your own custom exclude patterns:

custom_excludes = [
    "*.log",
    "temp",
    "custom_folder"
]

structure = get_repo_structure(
    github_url="https://github.com/username/repo.git",
    clone_path="local_repo_folder",
    exclude_patterns=custom_excludes
)

Output Format

The tool generates a nested dictionary structure that represents the repository layout:

{
  "name": "repo_name",
  "type": "directory",
  "children": [
    {
      "name": "src",
      "type": "directory",
      "children": [
        {
          "name": "main.py",
          "type": "file"
        }
      ]
    }
  ]
}

Components

repo_structure.py

The main module containing:

  • Directory traversal logic
  • Repository cloning functionality
  • Exclude pattern definitions
  • .gitignore integration

clone_progress.py

A helper module that provides:

  • Custom progress handler for Git operations
  • Integration with the alive-progress library
  • Visual feedback during repository cloning

Contributing

Contributions are welcome! Some areas for potential improvement:

  • Additional project-type exclude patterns
  • Support for other Git hosting services
  • Enhanced progress reporting
  • Additional output formats
  • Performance optimizations for large repositories

License

This project is open-source. See the LICENSE file for details.


Thank you for using github-repo-structure ! If you have any questions or suggestions, don’t hesitate to create an issue or open a discussion. Happy coding! ```

About

A Python utility for cloning GitHub repositories and generating a JSON-friendly representation of their directory structure, with support for .gitignore rules, customizable exclude patterns, and private repositories. Ideal for developers looking to analyze and visualize repository layouts efficiently.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages