Skip to content

Commit dc0a138

Browse files
committed
run pre commit
1 parent 8762a01 commit dc0a138

File tree

140 files changed

+952
-661
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

140 files changed

+952
-661
lines changed

.github/FUNDING.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,4 @@ lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cl
1212
polar: # Replace with a single Polar username
1313
buy_me_a_coffee: # Replace with a single Buy Me a Coffee username
1414
thanks_dev: # Replace with a single thanks.dev username
15-
custom:
15+
custom:

.github/ISSUE_TEMPLATE/custom.md

-2
Original file line numberDiff line numberDiff line change
@@ -6,5 +6,3 @@ labels: ''
66
assignees: ''
77

88
---
9-
10-

.github/workflows/release.yml

+13-13
Original file line numberDiff line numberDiff line change
@@ -19,21 +19,21 @@ jobs:
1919
uses: actions/setup-python@v5
2020
with:
2121
python-version: '3.10'
22-
22+
2323
- name: Install uv
2424
uses: astral-sh/setup-uv@v3
25-
25+
2626
- name: Install Node Env
2727
uses: actions/setup-node@v4
2828
with:
2929
node-version: 20
30-
30+
3131
- name: Checkout
3232
uses: actions/[email protected]
3333
with:
3434
fetch-depth: 0
3535
persist-credentials: false
36-
36+
3737
- name: Build and validate package
3838
run: |
3939
uv venv
@@ -44,10 +44,10 @@ jobs:
4444
uv build
4545
uv pip install --upgrade pkginfo==1.12.0 twine==6.0.1 # Upgrade pkginfo and install twine
4646
python -m twine check dist/*
47-
47+
4848
- name: Debug Dist Directory
4949
run: ls -al dist
50-
50+
5151
- name: Cache build
5252
uses: actions/cache@v3
5353
with:
@@ -59,7 +59,7 @@ jobs:
5959
runs-on: ubuntu-latest
6060
needs: build
6161
environment: development
62-
if: >
62+
if: >
6363
github.event_name == 'push' && (github.ref == 'refs/heads/main' || github.ref == 'refs/heads/pre/beta') ||
6464
(github.event_name == 'pull_request' && github.event.action == 'closed' && github.event.pull_request.merged &&
6565
(github.event.pull_request.base.ref == 'main' || github.event.pull_request.base.ref == 'pre/beta'))
@@ -74,23 +74,23 @@ jobs:
7474
with:
7575
fetch-depth: 0
7676
persist-credentials: false
77-
77+
7878
- name: Restore build artifacts
7979
uses: actions/cache@v3
8080
with:
8181
path: ./dist
8282
key: ${{ runner.os }}-build-${{ github.sha }}
83-
83+
8484
- name: Semantic Release
8585
uses: cycjimmy/[email protected]
8686
with:
8787
semantic_version: 23
8888
extra_plugins: |
8989
semantic-release-pypi@3
90-
@semantic-release/git
91-
@semantic-release/commit-analyzer@12
92-
@semantic-release/release-notes-generator@13
93-
@semantic-release/github@10
90+
@semantic-release/git
91+
@semantic-release/commit-analyzer@12
92+
@semantic-release/release-notes-generator@13
93+
@semantic-release/github@10
9494
@semantic-release/changelog@6
9595
conventional-changelog-conventionalcommits@7
9696
env:

.releaserc.yml

-1
Original file line numberDiff line numberDiff line change
@@ -53,4 +53,3 @@ branches:
5353
channel: "dev"
5454
prerelease: "beta"
5555
debug: true
56-

Dockerfile

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,4 +6,4 @@ RUN pip install --no-cache-dir scrapegraphai
66
RUN pip install --no-cache-dir scrapegraphai[burr]
77

88
RUN python3 -m playwright install-deps
9-
RUN python3 -m playwright install
9+
RUN python3 -m playwright install

LICENSE

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,4 @@ Permission is hereby granted, free of charge, to any person obtaining a copy of
44

55
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
66

7-
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
7+
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@ The Official API Documentation can be found [here](https://docs.scrapegraphai.co
182182
</a>
183183
</div>
184184

185-
## 📈 Telemetry
185+
## 📈 Telemetry
186186
We collect anonymous usage metrics to enhance our package's quality and user experience. The data helps us prioritize improvements and ensure compatibility. If you wish to opt-out, set the environment variable SCRAPEGRAPHAI_TELEMETRY_ENABLED=false. For more information, please refer to the documentation [here](https://scrapegraph-ai.readthedocs.io/en/latest/scrapers/telemetry.html).
187187

188188

SECURITY.md

-1
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,3 @@
33
## Reporting a Vulnerability
44

55
For reporting a vulnerability contact directly [email protected]
6-

docs/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -55,7 +55,7 @@ markmap:
5555
- Use Selenium or Playwright to take screenshots
5656
- Use LLM to asses if it is a block-like page, paragraph-like page, etc.
5757
- [Issue #88](https://github.com/VinciGit00/Scrapegraph-ai/issues/88)
58-
58+
5959
## **Long-Term Goals**
6060

6161
- Automatic generation of scraping pipelines from a given prompt

docs/russian.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -228,4 +228,4 @@ ScrapeGraphAI лицензирован под MIT License. Подробнее с
228228
## Благодарности
229229

230230
- Мы хотели бы поблагодарить всех участников проекта и сообщество с открытым исходным кодом за их поддержку.
231-
- ScrapeGraphAI предназначен только для исследования данных и научных целей. Мы не несем ответственности за неправильное использование библиотеки.
231+
- ScrapeGraphAI предназначен только для исследования данных и научных целей. Мы не несем ответственности за неправильное использование библиотеки.

docs/source/conf.py

+9-10
Original file line numberDiff line numberDiff line change
@@ -12,31 +12,30 @@
1212
import sys
1313

1414
# import all the modules
15-
sys.path.insert(0, os.path.abspath('../../'))
15+
sys.path.insert(0, os.path.abspath("../../"))
1616

17-
project = 'ScrapeGraphAI'
18-
copyright = '2024, ScrapeGraphAI'
19-
author = 'Marco Vinciguerra, Marco Perini, Lorenzo Padoan'
17+
project = "ScrapeGraphAI"
18+
copyright = "2024, ScrapeGraphAI"
19+
author = "Marco Vinciguerra, Marco Perini, Lorenzo Padoan"
2020

2121
html_last_updated_fmt = "%b %d, %Y"
2222

2323
# -- General configuration ---------------------------------------------------
2424
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration
2525

26-
extensions = ['sphinx.ext.autodoc', 'sphinx.ext.napoleon']
26+
extensions = ["sphinx.ext.autodoc", "sphinx.ext.napoleon"]
2727

28-
templates_path = ['_templates']
28+
templates_path = ["_templates"]
2929
exclude_patterns = []
3030

3131
# -- Options for HTML output -------------------------------------------------
3232
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
3333

34-
html_theme = 'furo'
34+
html_theme = "furo"
3535
html_theme_options = {
3636
"source_repository": "https://github.com/VinciGit00/Scrapegraph-ai/",
3737
"source_branch": "main",
3838
"source_directory": "docs/source/",
39-
'navigation_with_keys': True,
40-
'sidebar_hide_name': False,
39+
"navigation_with_keys": True,
40+
"sidebar_hide_name": False,
4141
}
42-

docs/source/getting_started/examples.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -84,4 +84,4 @@ After that, you can run the following code, using only your machine resources br
8484
result = smart_scraper_graph.run()
8585
print(result)
8686
87-
To find out how you can customize the `graph_config` dictionary, by using different LLM and adding new parameters, check the `Scrapers` section!
87+
To find out how you can customize the `graph_config` dictionary, by using different LLM and adding new parameters, check the `Scrapers` section!

docs/source/getting_started/installation.rst

+2-4
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ The library is available on PyPI, so it can be installed using the following com
2222
pip install scrapegraphai
2323
2424
.. important::
25-
25+
2626
It is higly recommended to install the library in a virtual environment (conda, venv, etc.)
2727

2828
If your clone the repository, it is recommended to use a package manager like `uv <https://github.com/astral-sh/uv>`_.
@@ -35,7 +35,7 @@ To install the library using uv, you can run the following command:
3535
uv build
3636
3737
.. caution::
38-
38+
3939
**Rye** must be installed first by following the instructions on the `official website <https://github.com/astral-sh/uv>`_.
4040

4141
Additionally on Windows when using WSL
@@ -46,5 +46,3 @@ If you are using Windows Subsystem for Linux (WSL) and you are facing issues wit
4646
.. code-block:: bash
4747
4848
sudo apt-get -y install libnss3 libnspr4 libgbm1 libasound2
49-
50-

docs/source/index.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -43,4 +43,4 @@ Indices and tables
4343

4444
* :ref:`genindex`
4545
* :ref:`modindex`
46-
* :ref:`search`
46+
* :ref:`search`

docs/source/introduction/overview.rst

+5-5
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,11 @@
33
:width: 50%
44
:alt: ScrapegraphAI
55

6-
Overview
6+
Overview
77
========
88

99
ScrapeGraphAI is an **open-source** Python library designed to revolutionize **scraping** tools.
10-
In today's data-intensive digital landscape, this library stands out by integrating **Large Language Models** (LLMs)
10+
In today's data-intensive digital landscape, this library stands out by integrating **Large Language Models** (LLMs)
1111
and modular **graph-based** pipelines to automate the scraping of data from various sources (e.g., websites, local files etc.).
1212

1313
Simply specify the information you need to extract, and ScrapeGraphAI handles the rest, providing a more **flexible** and **low-maintenance** solution compared to traditional scraping tools.
@@ -16,7 +16,7 @@ Why ScrapegraphAI?
1616
==================
1717

1818
Traditional web scraping tools often rely on fixed patterns or manual configuration to extract data from web pages.
19-
ScrapegraphAI, leveraging the power of LLMs, adapts to changes in website structures, reducing the need for constant developer intervention.
19+
ScrapegraphAI, leveraging the power of LLMs, adapts to changes in website structures, reducing the need for constant developer intervention.
2020
This flexibility ensures that scrapers remain functional even when website layouts change.
2121

2222
We support many LLMs including **GPT, Gemini, Groq, Azure, Hugging Face** etc.
@@ -161,13 +161,13 @@ FAQ
161161
- Check your internet connection. Low speed or unstable connection can cause the HTML to not load properly.
162162

163163
- Try using a proxy server to mask your IP address. Check out the :ref:`Proxy` section for more information on how to configure proxy settings.
164-
164+
165165
- Use a different LLM model. Some models might perform better on certain websites than others.
166166

167167
- Set the `verbose` parameter to `True` in the graph_config to see more detailed logs.
168168

169169
- Visualize the pipeline graphically using :ref:`Burr`.
170-
170+
171171
If the issue persists, please report it on the GitHub repository.
172172

173173
6. **How does ScrapeGraphAI handle the context window limit of LLMs?**

docs/source/modules/modules.rst

-1
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,3 @@ scrapegraphai
77
scrapegraphai
88

99
scrapegraphai.helpers.models_tokens
10-

docs/source/modules/scrapegraphai.helpers.models_tokens.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -25,4 +25,4 @@ Example usage:
2525
else:
2626
print(f"{model_name} not found in the models list")
2727
28-
This information is crucial for users to understand the capabilities and limitations of different AI models when designing their scraping pipelines.
28+
This information is crucial for users to understand the capabilities and limitations of different AI models when designing their scraping pipelines.

docs/source/scrapers/llm.rst

+5-6
Original file line numberDiff line numberDiff line change
@@ -133,11 +133,11 @@ We can also pass a model instance for the chat model and the embedding model. Fo
133133
openai_api_version="AZURE_OPENAI_API_VERSION",
134134
)
135135
# Supposing model_tokens are 100K
136-
model_tokens_count = 100000
136+
model_tokens_count = 100000
137137
graph_config = {
138138
"llm": {
139139
"model_instance": llm_model_instance,
140-
"model_tokens": model_tokens_count,
140+
"model_tokens": model_tokens_count,
141141
},
142142
"embeddings": {
143143
"model_instance": embedder_model_instance
@@ -198,7 +198,7 @@ We can also pass a model instance for the chat model and the embedding model. Fo
198198
Other LLM models
199199
^^^^^^^^^^^^^^^^
200200

201-
We can also pass a model instance for the chat model and the embedding model through the **model_instance** parameter.
201+
We can also pass a model instance for the chat model and the embedding model through the **model_instance** parameter.
202202
This feature enables you to utilize a Langchain model instance.
203203
You will discover the model you require within the provided list:
204204

@@ -208,7 +208,7 @@ You will discover the model you require within the provided list:
208208
For instance, consider **chat model** Moonshot. We can integrate it in the following manner:
209209

210210
.. code-block:: python
211-
211+
212212
from langchain_community.chat_models.moonshot import MoonshotChat
213213
214214
# The configuration parameters are contingent upon the specific model you select
@@ -221,8 +221,7 @@ For instance, consider **chat model** Moonshot. We can integrate it in the follo
221221
llm_model_instance = MoonshotChat(**llm_instance_config)
222222
graph_config = {
223223
"llm": {
224-
"model_instance": llm_model_instance,
224+
"model_instance": llm_model_instance,
225225
"model_tokens": 5000
226226
},
227227
}
228-

examples/ScrapegraphAI_cookbook.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -912,4 +912,4 @@
912912
},
913913
"nbformat": 4,
914914
"nbformat_minor": 0
915-
}
915+
}

examples/code_generator_graph/.env.example

+1-1
Original file line numberDiff line numberDiff line change
@@ -11,4 +11,4 @@ DEFAULT_LANGUAGE=python
1111
GENERATE_TESTS=true
1212
ADD_DOCUMENTATION=true
1313
CODE_STYLE=pep8
14-
TYPE_CHECKING=true
14+
TYPE_CHECKING=true

examples/code_generator_graph/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -27,4 +27,4 @@ code = graph.generate("code specification")
2727
## Environment Variables
2828

2929
Required environment variables:
30-
- `OPENAI_API_KEY`: Your OpenAI API key
30+
- `OPENAI_API_KEY`: Your OpenAI API key

examples/code_generator_graph/ollama/code_generator_graph_ollama.py

+10-5
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,13 @@
1-
"""
1+
"""
22
Basic example of scraping pipeline using Code Generator with schema
33
"""
44

55
import json
66
from typing import List
7+
78
from dotenv import load_dotenv
89
from pydantic import BaseModel, Field
10+
911
from scrapegraphai.graphs import CodeGeneratorGraph
1012

1113
load_dotenv()
@@ -14,13 +16,16 @@
1416
# Define the output schema for the graph
1517
# ************************************************
1618

19+
1720
class Project(BaseModel):
1821
title: str = Field(description="The title of the project")
1922
description: str = Field(description="The description of the project")
2023

24+
2125
class Projects(BaseModel):
2226
projects: List[Project]
2327

28+
2429
# ************************************************
2530
# Define the configuration for the graph
2631
# ************************************************
@@ -41,9 +46,9 @@ class Projects(BaseModel):
4146
"syntax": 3,
4247
"execution": 3,
4348
"validation": 3,
44-
"semantic": 3
49+
"semantic": 3,
4550
},
46-
"output_file_name": "extracted_data.py"
51+
"output_file_name": "extracted_data.py",
4752
}
4853

4954
# ************************************************
@@ -54,8 +59,8 @@ class Projects(BaseModel):
5459
prompt="List me all the projects with their description",
5560
source="https://perinim.github.io/projects/",
5661
schema=Projects,
57-
config=graph_config
62+
config=graph_config,
5863
)
5964

6065
result = code_generator_graph.run()
61-
print(result)
66+
print(result)

0 commit comments

Comments
 (0)