Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use async io where possible to improve runtime performance #163

Open
wants to merge 18 commits into
base: main
Choose a base branch
from

Conversation

netomi
Copy link

@netomi netomi commented Nov 29, 2023

This PR fixes #162 .

Things that this PR will modify:

  • use aiohttp whereever appropriate to improve performance
  • cleanups of existing code where appropriate (e.g. get_pypi_data_from_purl which returned duplicate data for each package)

@netomi netomi marked this pull request as ready for review November 30, 2023 12:31
@TG1999
Copy link
Contributor

TG1999 commented Nov 30, 2023

@netomi thanks! please check the errors in the CI and also do not forget to regen the tests just in case dependencies in tests have been updated https://github.com/nexB/python-inspector#testing

@netomi
Copy link
Author

netomi commented Nov 30, 2023

There are a couple of things that I still need to fix, but the failing CI runs are unrelated to this PR afaict. They also exist for other PRs and look like that the setup is broken.

@TG1999
Copy link
Contributor

TG1999 commented Nov 30, 2023

@netomi please rebase your PR with latest main, tests are fixed in this PR #165

@netomi
Copy link
Author

netomi commented Nov 30, 2023

ok so there are some unit test failure that I need to address. Most of them are from the fact that python-inspector returned duplicate package entries in its result up to now, which I fixed, so the expected test results need to be adjusted to take this change into account.

@netomi
Copy link
Author

netomi commented Nov 30, 2023

I could fix most tests (weird thing is that 4 tests fail locally when run in pycharm, but work on the command line so that might be a pycharm problem), there are some weird test failures on macos and windows only. Do you have any idea about that?

@netomi
Copy link
Author

netomi commented Dec 1, 2023

still need to take a look at etc/scripts if the there also need to be adjustments after the changes.

@pombredanne
Copy link
Member

@netomi BTW, do not bother with the etc/scripts ... these come from the shared skeleton repo at https://github.com/nexB/skeleton and should not be of concern for you.

@sschuberth
Copy link
Contributor

Can this be moved forward, please? 😬

@TG1999
Copy link
Contributor

TG1999 commented Jan 26, 2024

@sschuberth tests are falling, tests needs to be regen

@sschuberth
Copy link
Contributor

@sschuberth tests are falling, tests needs to be regen

Would you have time to do this @netomi?

@netomi
Copy link
Author

netomi commented Nov 19, 2024

hmm I can take a look, but I am quite discouraged from contributing to a repo that has to constantly rebuilding its tests and they are failing most of the time when creating a PR. You still need to inspect if its related to your changes or the test data is just out of date.

@pombredanne
Copy link
Member

hmm I can take a look, but I am quite discouraged from contributing to a repo that has to constantly rebuilding its tests and they are failing most of the time when creating a PR. You still need to inspect if its related to your changes or the test data is just out of date.

@netomi I agree this is not great. What do you think we could do as an alternative?

@netomi
Copy link
Author

netomi commented Nov 20, 2024

please dont actually call live servers during tests but rather mock their responses with data you might have retrieved from the server at one point in time. I understand that is a bit harder to do, but ultimately results in much more stable and reliable tests imho.

PyGitHub has a nice system for their tests where they replay data:

https://github.com/PyGithub/PyGithub

@sschuberth
Copy link
Contributor

At least one of the errors seems to be

ImportError: cannot import name 'cached_property' from 'functools' (/opt/hostedtoolcache/Python/3.7.17/x64/lib/python3.7/functools.py)

@netomi
Copy link
Author

netomi commented Nov 28, 2024

no idea where this error is coming from. This decorator is not used anywhere in the code. Probably from a dependency? This error btw also occurs for the default branch, so its not related to this PR afaict.

@sschuberth
Copy link
Contributor

This error btw also occurs for the default branch, so its not related to this PR afaict.

@TG1999 or @chinyeungli, can you assist here by fixing the test baseline in main to move things forward?

@sschuberth
Copy link
Contributor

Release https://github.com/aboutcode-org/python-inspector/releases/tag/v0.13.1 says to fix the CI issues. So @netomi could you please rebase and give this a try again?

@sschuberth
Copy link
Contributor

Merge branch 'main' into use-asyncio

I guess you should rebase instead to not make the DCO check fail on the merge commit?

@netomi
Copy link
Author

netomi commented Mar 26, 2025

so I rebased my PR but there are some test failures after that due to duplicate data in the test data json files.
I think I remember cleaning up the test data in this regard as my changes avoid such duplicates but am not sure anymore.

Here is an example: https://github.com/aboutcode-org/python-inspector/blob/main/tests/data/test-api-expected.json
It contains the click dependency twice.

@sschuberth
Copy link
Contributor

Here is an example: https://github.com/aboutcode-org/python-inspector/blob/main/tests/data/test-api-expected.json
It contains the click dependency twice.

@TG1999 is that something you could fix in main?

@netomi
Copy link
Author

netomi commented Mar 26, 2025

I could figure out how to regenerate test fixtures.

@netomi
Copy link
Author

netomi commented Mar 26, 2025

I am lost, running the tests pass locally.
On azure there are still lots of failures for which I dont know where they come from.

= 168 passed, 2 skipped, 1068 warnings in 83.39s (0:01:23) 

@TG1999
Copy link
Contributor

TG1999 commented Mar 26, 2025

@netomi I am checking this

@TG1999
Copy link
Contributor

TG1999 commented Mar 26, 2025

@netomi I have pushed a regen fix, let's see how it goes. Additionally sign-off is missing from commits. Please check that. Thanks!

netomi and others added 13 commits March 26, 2025 10:10
Signed-off-by: Thomas Neidhart <[email protected]>
Signed-off-by: Thomas Neidhart <[email protected]>
Signed-off-by: Thomas Neidhart <[email protected]>
Signed-off-by: Thomas Neidhart <[email protected]>
Signed-off-by: Thomas Neidhart <[email protected]>
Signed-off-by: Thomas Neidhart <[email protected]>
Signed-off-by: Thomas Neidhart <[email protected]>
Signed-off-by: Thomas Neidhart <[email protected]>
Signed-off-by: Thomas Neidhart <[email protected]>
Signed-off-by: Thomas Neidhart <[email protected]>
Signed-off-by: Tushar Goel <[email protected]>
Signed-off-by: Thomas Neidhart <[email protected]>
@TG1999
Copy link
Contributor

TG1999 commented Mar 26, 2025

@netomi For Python 3.12 build is failing

@netomi
Copy link
Author

netomi commented Mar 26, 2025

ah that is due to requirements, I did test something locally

diff --git a/requirements.txt b/requirements.txt
index 91fd48a..f64e888 100644
--- a/requirements.txt
+++ b/requirements.txt
@@ -10,7 +10,7 @@ idna==3.3
 importlib-metadata==4.12.0
 intbitset==3.1.0
 packageurl-python==0.10.0
-packaging==21.3
+packaging==24.2
 packvers==21.5
 pip-requirements-parser==32.0.1
 pkginfo2==30.0.0
@@ -24,5 +24,5 @@ text-unidecode==1.3
 toml==0.10.2
 urllib3==1.26.11
 zipp==3.8.1
-aiohttp==3.8.6
+aiohttp==3.11.14
 aiofiles==23.2.1
tn@proteus:~/workspace/netomi/python-

that solved it

@netomi
Copy link
Author

netomi commented Mar 26, 2025

why is the signoff needed actually? This is just annoying. Or I am going to enable that by default in my gitconfig.

@netomi
Copy link
Author

netomi commented Mar 26, 2025

wow everything passes.

@sschuberth
Copy link
Contributor

why is the signoff needed actually?

For the DCO check. That's pretty much standard.

Or I am going to enable that by default in my gitconfig.

I'm doing the same with a commit template.

@netomi
Copy link
Author

netomi commented Mar 26, 2025

Thats the first project that requires my commits to be signed-off.

@sschuberth
Copy link
Contributor

Thats the first project that requires my commits to be signed-off.

I doubt that, ORT does the same. Are you maybe confusing sign-off with signing commits?

@netomi
Copy link
Author

netomi commented Mar 26, 2025

I understand the purpose of having my commits signed-off, its a lightweight version of an ICLA. I might have signed-off some commits in the past, but cant recall it anymore tbh.

@sschuberth
Copy link
Contributor

wow everything passes.

@TG1999 or @pombredanne, are we good to merge?

@pombredanne
Copy link
Member

@netomi would you have some benchmark that shows a performance improvement BTW? if yes do you mind to paste this here?

@netomi
Copy link
Author

netomi commented Mar 27, 2025

with a simple requirements.txt I get the following numbers:

latest release:

n@proteus:~/workspace/eclipse/otterdog$ time python-inspector -r requirements.txt -p 3.12 -o linux --json output.json
/home/tn/.local/pipx/shared/lib/python3.10/site-packages/_distutils_hack/__init__.py:15: UserWarning: Distutils was imported before Setuptools, but importing Setuptools also replaces the `distutils` module in `sys.modules`. This may lead to undesirable behaviors or errors. To avoid these issues, avoid using distutils directly, ensure that setuptools is installed in the traditional way (e.g. not an editable install), and/or make sure that setuptools is always imported before distutils.
  warnings.warn(
/home/tn/.local/pipx/shared/lib/python3.10/site-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils. Support for replacing an already imported distutils is deprecated. In the future, this condition will fail. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
  warnings.warn(

real	0m21,232s
user	0m6,449s
sys	0m0,313s

asyncio version:

(venv) tn@proteus:~/workspace/netomi/python-inspector$ time PYTHONPATH=src python src/python_inspector/resolve_cli.py -r r.txt -p 3.12 -o linux --json output.json 

real	0m5,558s
user	0m5,205s
sys	0m0,219s

with the asyncio version you get some nice output with --verbose what data is retrieved from pypi to see what is happening under the hood.

@netomi
Copy link
Author

netomi commented Mar 27, 2025

btw. python-inspector uses distutils internally, which has been removed in python 3.12. So you cant use it anymore with a more recent python version.

@netomi
Copy link
Author

netomi commented Mar 27, 2025

this is what I get when installing python-inspector using pipx with python 3.12:

tn@proteus:~/workspace/eclipse/otterdog$ pipx install python-inspector
  installed package python-inspector 0.13.1, installed using Python 3.12.7
  These apps are now globally available
    - python-inspector
done! ✨ 🌟 ✨
tn@proteus:~/workspace/eclipse/otterdog$ time python-inspector -r requirements.txt -p 3.12 -o linux --json output.json
Traceback (most recent call last):
  File "/home/tn/.local/bin/python-inspector", line 8, in <module>
    sys.exit(resolve_dependencies())
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tn/.local/pipx/venvs/python-inspector/lib/python3.12/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tn/.local/pipx/venvs/python-inspector/lib/python3.12/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/tn/.local/pipx/venvs/python-inspector/lib/python3.12/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tn/.local/pipx/venvs/python-inspector/lib/python3.12/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tn/.local/pipx/venvs/python-inspector/lib/python3.12/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tn/.local/pipx/venvs/python-inspector/lib/python3.12/site-packages/python_inspector/resolve_cli.py", line 228, in resolve_dependencies
    from python_inspector.api import resolve_dependencies as resolver_api
  File "/home/tn/.local/pipx/venvs/python-inspector/lib/python3.12/site-packages/python_inspector/api.py", line 33, in <module>
    from python_inspector.package_data import get_pypi_data_from_purl
  File "/home/tn/.local/pipx/venvs/python-inspector/lib/python3.12/site-packages/python_inspector/package_data.py", line 22, in <module>
    from python_inspector.resolution import get_python_version_from_env_tag
  File "/home/tn/.local/pipx/venvs/python-inspector/lib/python3.12/site-packages/python_inspector/resolution.py", line 41, in <module>
    from python_inspector.setup_py_live_eval import iter_requirements
  File "/home/tn/.local/pipx/venvs/python-inspector/lib/python3.12/site-packages/python_inspector/setup_py_live_eval.py", line 21, in <module>
    import distutils.core
ModuleNotFoundError: No module named 'distutils'

@netomi
Copy link
Author

netomi commented Mar 27, 2025

that answer explains why it works in a development environment:

https://stackoverflow.com/a/77284076

distutils seems to be included in setuptools.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use asyncio to gain performance
4 participants