Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install latest tensorflow using pip #2263

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

mathbunnyru
Copy link
Member

Describe your changes

Issue ticket if applicable

Checklist (especially for first-time contributors)

  • I have performed a self-review of my code
  • If it is a core feature, I have added thorough tests
  • I will try not to use force-push to make the review process easier for reviewers
  • I have updated the documentation for significant changes

@mathbunnyru
Copy link
Member Author

I asked in a similar TF issue for a help.

@mathbunnyru
Copy link
Member Author

@benz0li could you maybe please test my idea?

If you have a machine with a GPU (unfortunately, I don't) download the built tensorflow-notebook cuda image.
Here is the image: https://github.com/jupyter/docker-stacks/actions/runs/14256142130/artifacts/2879926212
You need to unzip it and then run zstd --uncompress --stdout ./FILE | docker load.

And then use the container with and without GPU enabled for the container and run this snippet:

import tensorflow as tf

print(tf.constant("Hello, TensorFlow"))
print(tf.reduce_sum(tf.random.normal([1000, 1000])))

If it works well when a GPU is enabled, I will simply disable this test for cuda image.
This way, people running gpu image without gpu enabled will get an error, which is a nice thing (if they want cpu version, they can use regular tensorflow-notebook image).

And it should fail without GPU enabled (which currently happens in the CI).

@benz0li
Copy link
Contributor

benz0li commented Apr 4, 2025

@benz0li could you maybe please test my idea?

@mathbunnyru Yes. I will get back to you soon.

@mathbunnyru
Copy link
Member Author

mathbunnyru commented Apr 4, 2025

I'm happy I recently introduced a change where we first upload an image and then test it, exactly for cases like this one 🙂
#2214

@benz0li
Copy link
Contributor

benz0li commented Apr 4, 2025

With GPU:

docker run --rm -ti --gpus all quay.io/jupyter/tensorflow-notebook python
Entered start.sh with args: python
Running hooks in: /usr/local/bin/start-notebook.d as uid: 1000 gid: 100
Done running hooks in: /usr/local/bin/start-notebook.d
Running hooks in: /usr/local/bin/before-notebook.d as uid: 1000 gid: 100
Sourcing shell script: /usr/local/bin/before-notebook.d/10activate-conda-env.sh
Sourcing shell script: /usr/local/bin/before-notebook.d/20tensorboard-proxy-env.sh
Done running hooks in: /usr/local/bin/before-notebook.d
Executing the command: python
Python 3.12.9 | packaged by conda-forge | (main, Mar  4 2025, 22:48:41) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2025-04-04 18:11:24.618843: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-04-04 18:11:24.636563: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1743790284.656221      20 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1743790284.662333      20 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1743790284.679148      20 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1743790284.679178      20 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1743790284.679181      20 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1743790284.679183      20 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2025-04-04 18:11:24.683904: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
>>> 
>>> print(tf.constant("Hello, TensorFlow"))
I0000 00:00:1743790289.946154      20 gpu_device.cc:2019] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6826 MB memory:  -> device: 0, name: Quadro RTX 4000, pci bus id: 0000:af:00.0, compute capability: 7.5
tf.Tensor(b'Hello, TensorFlow', shape=(), dtype=string)
>>> print(tf.reduce_sum(tf.random.normal([1000, 1000])))
tf.Tensor(-441.97748, shape=(), dtype=float32)

@benz0li
Copy link
Contributor

benz0li commented Apr 4, 2025

Without GPU:

docker run --rm -ti --runtime runc quay.io/jupyter/tensorflow-notebook python
Entered start.sh with args: python
Running hooks in: /usr/local/bin/start-notebook.d as uid: 1000 gid: 100
Done running hooks in: /usr/local/bin/start-notebook.d
Running hooks in: /usr/local/bin/before-notebook.d as uid: 1000 gid: 100
Sourcing shell script: /usr/local/bin/before-notebook.d/10activate-conda-env.sh
Sourcing shell script: /usr/local/bin/before-notebook.d/20tensorboard-proxy-env.sh
Done running hooks in: /usr/local/bin/before-notebook.d
Executing the command: python
Python 3.12.9 | packaged by conda-forge | (main, Mar  4 2025, 22:48:41) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2025-04-04 18:13:01.754997: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-04-04 18:13:01.778961: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1743790381.799862       7 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1743790381.806458       7 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1743790381.825437       7 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1743790381.825472       7 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1743790381.825477       7 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1743790381.825490       7 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2025-04-04 18:13:01.830431: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
>>> 
>>> print(tf.constant("Hello, TensorFlow"))
2025-04-04 18:13:05.543119: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
tf.Tensor(b'Hello, TensorFlow', shape=(), dtype=string)
>>> print(tf.reduce_sum(tf.random.normal([1000, 1000])))
tf.Tensor(-172.88205, shape=(), dtype=float32)

@mathbunnyru
Copy link
Member Author

Thank you!
I can't understand from your logs, does gpu version work when using a non-interactive mode or also fails with a non-zero rc?

@benz0li
Copy link
Contributor

benz0li commented Apr 5, 2025

Thank you! I can't understand from your logs, does gpu version work when using a non-interactive mode or also fails with a non-zero rc?

docker run --rm --gpus all quay.io/jupyter/tensorflow-notebook bash -c 'python -c "import tensorflow as tf"; echo $?'
Entered start.sh with args: bash -c python -c "import tensorflow as tf"; echo $?
Running hooks in: /usr/local/bin/start-notebook.d as uid: 1000 gid: 100
Done running hooks in: /usr/local/bin/start-notebook.d
Running hooks in: /usr/local/bin/before-notebook.d as uid: 1000 gid: 100
Sourcing shell script: /usr/local/bin/before-notebook.d/10activate-conda-env.sh
Sourcing shell script: /usr/local/bin/before-notebook.d/20tensorboard-proxy-env.sh
Done running hooks in: /usr/local/bin/before-notebook.d
Executing the command: bash -c python -c "import tensorflow as tf"; echo $?
2025-04-05 03:45:02.839015: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-04-05 03:45:02.856084: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1743824702.877829      47 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1743824702.885568      47 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1743824702.901158      47 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1743824702.901180      47 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1743824702.901182      47 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1743824702.901184      47 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2025-04-05 03:45:02.905902: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
0

docker run --rm --gpus all quay.io/jupyter/tensorflow-notebook bash -c 'python -c "import tensorflow as tf; print(tf.constant(\"Hello, TensorFlow\"))"; echo $?'
Entered start.sh with args: bash -c python -c "import tensorflow as tf; print(tf.constant(\"Hello, TensorFlow\"))"; echo $?
Running hooks in: /usr/local/bin/start-notebook.d as uid: 1000 gid: 100
Done running hooks in: /usr/local/bin/start-notebook.d
Running hooks in: /usr/local/bin/before-notebook.d as uid: 1000 gid: 100
Sourcing shell script: /usr/local/bin/before-notebook.d/10activate-conda-env.sh
Sourcing shell script: /usr/local/bin/before-notebook.d/20tensorboard-proxy-env.sh
Done running hooks in: /usr/local/bin/before-notebook.d
Executing the command: bash -c python -c "import tensorflow as tf; print(tf.constant(\"Hello, TensorFlow\"))"; echo $?
2025-04-05 03:46:00.235913: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-04-05 03:46:00.252138: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1743824760.270857      47 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1743824760.276591      47 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1743824760.291398      47 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1743824760.291418      47 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1743824760.291420      47 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1743824760.291422      47 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2025-04-05 03:46:00.295924: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
I0000 00:00:1743824763.655489      47 gpu_device.cc:2019] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6826 MB memory:  -> device: 0, name: Quadro RTX 4000, pci bus id: 0000:af:00.0, compute capability: 7.5
tf.Tensor(b'Hello, TensorFlow', shape=(), dtype=string)
0

docker run --rm --gpus all quay.io/jupyter/tensorflow-notebook bash -c 'python -c "import tensorflow as tf; print(tf.constant(\"Hello, TensorFlow\")); print(tf.reduce_sum(tf.random.normal([1000, 1000])))"; echo $?'
Running hooks in: /usr/local/bin/start-notebook.d as uid: 1000 gid: 100
Done running hooks in: /usr/local/bin/start-notebook.d
Running hooks in: /usr/local/bin/before-notebook.d as uid: 1000 gid: 100
Sourcing shell script: /usr/local/bin/before-notebook.d/10activate-conda-env.sh
Sourcing shell script: /usr/local/bin/before-notebook.d/20tensorboard-proxy-env.sh
Done running hooks in: /usr/local/bin/before-notebook.d
Executing the command: bash -c python -c "import tensorflow as tf; print(tf.constant(\"Hello, TensorFlow\")); print(tf.reduce_sum(tf.random.normal([1000, 1000])))"; echo $?
2025-04-05 03:46:59.024986: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-04-05 03:46:59.041635: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1743824819.061216      47 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1743824819.067417      47 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1743824819.083184      47 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1743824819.083204      47 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1743824819.083206      47 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1743824819.083208      47 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2025-04-05 03:46:59.088034: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
I0000 00:00:1743824822.481466      47 gpu_device.cc:2019] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6826 MB memory:  -> device: 0, name: Quadro RTX 4000, pci bus id: 0000:af:00.0, compute capability: 7.5
tf.Tensor(b'Hello, TensorFlow', shape=(), dtype=string)
tf.Tensor(402.03845, shape=(), dtype=float32)
0

@benz0li
Copy link
Contributor

benz0li commented Apr 5, 2025

Please be aware of

docker run --rm --runtime runc quay.io/jupyter/tensorflow-notebook bash -c 'python -c "import tensorflow as tf; tf.constant(\"Hello, TensorFlow\")"; echo $?'
Entered start.sh with args: bash -c python -c "import tensorflow as tf; tf.constant(\"Hello, TensorFlow\")"; echo $?
Running hooks in: /usr/local/bin/start-notebook.d as uid: 1000 gid: 100
Done running hooks in: /usr/local/bin/start-notebook.d
Running hooks in: /usr/local/bin/before-notebook.d as uid: 1000 gid: 100
Sourcing shell script: /usr/local/bin/before-notebook.d/10activate-conda-env.sh
Sourcing shell script: /usr/local/bin/before-notebook.d/20tensorboard-proxy-env.sh
Done running hooks in: /usr/local/bin/before-notebook.d
Executing the command: bash -c python -c "import tensorflow as tf; tf.constant(\"Hello, TensorFlow\")"; echo $?
2025-04-05 04:04:00.149511: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-04-05 04:04:00.166285: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1743825840.186139      34 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1743825840.192212      34 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
W0000 00:00:1743825840.208241      34 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1743825840.208262      34 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1743825840.208264      34 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
W0000 00:00:1743825840.208265      34 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.
2025-04-05 04:04:00.213691: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-04-05 04:04:03.308189: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)
0

compared to

docker run --rm --runtime runc quay.io/jupyter/tensorflow-notebook bash -c 'python -c "import notfound as tf"; echo $?'
Entered start.sh with args: bash -c python -c "import notfound as tf"; echo $?
Running hooks in: /usr/local/bin/start-notebook.d as uid: 1000 gid: 100
Done running hooks in: /usr/local/bin/start-notebook.d
Running hooks in: /usr/local/bin/before-notebook.d as uid: 1000 gid: 100
Sourcing shell script: /usr/local/bin/before-notebook.d/10activate-conda-env.sh
Sourcing shell script: /usr/local/bin/before-notebook.d/20tensorboard-proxy-env.sh
Done running hooks in: /usr/local/bin/before-notebook.d
Executing the command: bash -c python -c "import notfound as tf"; echo $?
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'notfound'
1

@benz0li
Copy link
Contributor

benz0li commented Apr 5, 2025

I can't understand from your logs, does gpu version work

There is no error when using a GPU. Only an informational message:

I0000 00:00:1743790289.946154      20 gpu_device.cc:2019] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 6826 MB memory:  -> device: 0, name: Quadro RTX 4000, pci bus id: 0000:af:00.0, compute capability: 7.5

@mathbunnyru
Copy link
Member Author

Thank you, @benz0li!

My bad - it seems that there is no failure (rc != 0) in either case, only the warning, which causes the test to fail.
I will allow this test to have warning

@mathbunnyru
Copy link
Member Author

It seems to work; failed buils happen due to conda-forge/libxml2-feedstock#145 and this is unrelated to my change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants