Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apache-spark formula seems to be broken #210638

Closed
4 tasks done
Nicolas-Parot-Alvarez-Paidy opened this issue Mar 12, 2025 · 7 comments
Closed
4 tasks done

apache-spark formula seems to be broken #210638

Nicolas-Parot-Alvarez-Paidy opened this issue Mar 12, 2025 · 7 comments
Labels
user configuration User configuration rather than a Homebrew issue

Comments

@Nicolas-Parot-Alvarez-Paidy
Copy link

Nicolas-Parot-Alvarez-Paidy commented Mar 12, 2025

brew gist-logs <formula> link OR brew config AND brew doctor output

brew gist-logs apache-spark                                            
Error: No logs.

Verification

  • My brew doctor output says Your system is ready to brew. and am still able to reproduce my issue.
  • I ran brew update and am still able to reproduce my issue.
  • I have resolved all warnings from brew doctor and that did not fix my problem.
  • I searched for recent similar issues at https://github.com/Homebrew/homebrew-core/issues?q=is%3Aissue and found no duplicates.

What were you trying to do (and why)?

Install Apache Spark to run Apache Spark code locally.

What happened (include all command output)?

The brew command worked without errors, but the installation is not functional. All the Spark commands exit immediately.
Ex: spark-sell exits immediately with no error message.

What did you expect to happen?

Spark commands should work. Ex:

❯ spark-shell --version
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.5.5
      /_/
                        
Using Scala version 2.13.8, OpenJDK 64-Bit Server VM, 17.0.13

Step-by-step reproduction instructions (by running brew commands)

  • brew install openjdk@17
  • brew install [email protected]
  • brew install apache-spark
  • export JAVA_HOME="/opt/homebrew/Cellar/openjdk@17/17.0.13/libexec/openjdk.jdk/Contents/Home/"
  • export SPARK_HOME="/opt/homebrew/Cellar/apache-spark/"
  • export PATH=$SPARK_HOME/bin:$PATH

I know the JDK and Scala are installed correctly, and variables are correct because it works if I download the package from the official website and set the SPARK_HOME to this package. https://spark.apache.org/downloads.html

Consequently, I think something may be broken with the formula of Apache Spark.

@SMillerDev
Copy link
Member

export SPARK_HOME="/opt/homebrew/Cellar/apache-spark/"

That's not a path that has the install. I'd recommend setting these opt paths instead of Cellar because they are stable between updates:

export JAVA_HOME="/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home/"
export SPARK_HOME="/opt/homebrew/opt/apache-spark/"

@cho-m cho-m added the user configuration User configuration rather than a Homebrew issue label Mar 12, 2025
@cho-m
Copy link
Member

cho-m commented Mar 12, 2025

Closing as formula is behaving as expected. Apache Spark will fail if you give it incorrect paths.

The formula is set up to work out-of-the-box so if you want to use environment variables then it is up to you to make sure they are correct.

@cho-m cho-m closed this as not planned Won't fix, can't repro, duplicate, stale Mar 12, 2025
@Nicolas-Parot-Alvarez-Paidy
Copy link
Author

That's not a path that has the install. I'd recommend setting these opt paths instead of Cellar because they are stable between updates:

This is the path that is logged when you run the installation command and brew info apache-spark:

==> Installing apache-spark
==> Pouring apache-spark--3.5.5.all.bottle.tar.gz
🍺 /opt/homebrew/Cellar/apache-spark/3.5.5: 1,823 files, 423.7MB

So that's not obvious, I think. Could the "stable path" be surfaced better?

The formula is set up to work out-of-the-box so if you want to use environment variables then it is up to you to make sure they are correct.

  1. Unless I missed something, the formula does not set the SPARK_HOME nor does it modify the PATH. So it doesn't work out of the box, you need to set up those yourself.
  2. I have now tried with the path recommended by @SMillerDev, and it still fails silently. So, I think the issue is not solved. Have you tried installation and running spark-shell as I described above?

I'm on M2 Pro with Sequoia 15.3.1 if that helps.

My setup works thanks to manually downloading, so I am trying to help the next people.

@gromgit
Copy link
Contributor

gromgit commented Mar 13, 2025

Unless I missed something, the formula does not set the SPARK_HOME nor does it modify the PATH. So it doesn't work out of the box, you need to set up those yourself.

Incorrect:

% brew install apache-spark
==> Downloading https://ghcr.io/v2/homebrew/core/apache-spark/manifests/3.5.5
############################################################################################################################################################################### 100.0%
==> Fetching dependencies for apache-spark: openjdk@17
==> Downloading https://ghcr.io/v2/homebrew/core/openjdk/17/manifests/17.0.14-1
############################################################################################################################################################################### 100.0%
==> Fetching openjdk@17
==> Downloading https://ghcr.io/v2/homebrew/core/openjdk/17/blobs/sha256:eb099d9774ea93a59997a45931ae4f0da6e19b150e9f291ab369a18fb7f28c67
############################################################################################################################################################################### 100.0%
==> Verifying attestation for openjdk@17
==> Fetching apache-spark
==> Downloading https://ghcr.io/v2/homebrew/core/apache-spark/blobs/sha256:ee6cf6111d441cd77534189b45edf345a560aff75a2a547ea223d3ccb613f2af
############################################################################################################################################################################### 100.0%
==> Verifying attestation for apache-spark
==> Installing dependencies for apache-spark: openjdk@17
==> Installing apache-spark dependency: openjdk@17
==> Downloading https://ghcr.io/v2/homebrew/core/openjdk/17/manifests/17.0.14-1
Already downloaded: /Volumes/aho/Library/Caches/Homebrew/downloads/bb1f4418cd7cb469ab902ea185d33a07f6a837f58930b575bb669712613e0623--openjdk@17-17.0.14-1.bottle_manifest.json
==> Pouring [email protected]_sonoma.bottle.1.tar.gz
🍺  /opt/homebrew/Cellar/openjdk@17/17.0.14: 636 files, 304.3MB
==> Installing apache-spark
==> Pouring apache-spark--3.5.5.all.bottle.tar.gz
🍺  /opt/homebrew/Cellar/apache-spark/3.5.5: 1,823 files, 423.7MB
==> Running `brew cleanup apache-spark`...
Disable this behaviour by setting HOMEBREW_NO_INSTALL_CLEANUP.
Hide these hints with HOMEBREW_NO_ENV_HINTS (see `man brew`).

% declare -p PATH JAVA_HOME SPARK_HOME
declare: no such variable: JAVA_HOME
declare: no such variable: SPARK_HOME
export -T PATH path=( /Users/aho/bin /Users/aho/bin/Darwin /Users/aho/bin/Darwin-arm64 /Users/aho/go/bin /Users/aho/.local/bin /Users/aho/perl5/bin /opt/homebrew/bin /opt/homebrew/sbin /usr/local/bin /System/Cryptexes/App/usr/bin /usr/bin /bin /usr/sbin /sbin /opt/X11/bin /Library/Apple/usr/bin )

% which spark-shell
/opt/homebrew/bin/spark-shell

% spark-shell   
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
25/03/13 09:24:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://rover-airm2.03s.net:4040
Spark context available as 'sc' (master = local[*], app id = local-1741829069449).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.5.5
      /_/
         
Using Scala version 2.12.18 (OpenJDK 64-Bit Server VM, Java 17.0.14)
Type in expressions to have them evaluated.
Type :help for more information.

scala> 

Worked for me out of the box from a fresh Homebrew install, no PATH manipulation needed, no need to set JAVA_HOME and SPARK_HOME. It broke for you because you set SPARK_HOME wrong, when you didn't have to set it (and JAVA_HOME) at all.

My setup works thanks to manually downloading, so I am trying to help the next people.

Sorry, but you're actually confusing everyone by claiming you need to set additional environment variables in an unnecessary and incorrect way. If you had to set these variables for this (or any) formula to work, the necessary instructions would have been printed out in a Caveats section after brew install apache-spark, and also with brew info apache-spark

If you're following online instructions to set these variables, note that they tell you to do that because you can put the binaries you download anywhere in your filesystem, which is probably not where the hardcoded default SPARK_HOME value points to. That's why you need to set it for manually downloaded binaries, so the programs know where to find the necessary components. Homebrew always puts formula binaries in fixed locations, so the default SPARK_HOME value that's hardcoded into those binaries is always correct for Homebrew installation, and therefore setting it is completely unnecessary.

@Nicolas-Parot-Alvarez-Paidy
Copy link
Author

Nicolas-Parot-Alvarez-Paidy commented Mar 13, 2025

Thank you for trying to reproduce.
I guess it could be that I had pre-existing env var since it has been the default way to install those Apache tools for years, as documented by the official Spark website https://spark.apache.org/docs/latest/.
I will try again by making sure I have no related env var set and report back.

@carlocab
Copy link
Member

This is the path that is logged when you run the installation command and brew info apache-spark:

==> Installing apache-spark
==> Pouring apache-spark--3.5.5.all.bottle.tar.gz
🍺 /opt/homebrew/Cellar/apache-spark/3.5.5: 1,823 files, 423.7MB

The path shown is

/opt/homebrew/Cellar/apache-spark/3.5.5

but you used

export SPARK_HOME="/opt/homebrew/Cellar/apache-spark/"

Note the missing 3.5.5.

So that's not obvious, I think. Could the "stable path" be surfaced better?

Perhaps, but it's not really clear how/where/when to.

If it's any easier, you could always do something like

export SPARK_HOME="$(brew --prefix apache-spark)"

and that always uses the stable path.

@cho-m
Copy link
Member

cho-m commented Mar 13, 2025

spark-shell should not fail silently. If you use a wrong path, it should tell you in an obvious manner:

JAVA_HOME=/some/incorrect/path spark-shell --version
/opt/homebrew/Cellar/apache-spark/3.5.5/libexec/bin/spark-class: line 71: /some/incorrect/path/bin/java: No such file or directory
/opt/homebrew/Cellar/apache-spark/3.5.5/libexec/bin/spark-class: line 97: CMD: bad array subscript
head: illegal line count -- -1SPARK_HOME=/some/incorrect/path spark-shell --version
/opt/homebrew/Cellar/apache-spark/3.5.5/libexec/bin/spark-shell: line 60: /some/incorrect/path/bin/spark-submit: No such file or directory

Spark's own commands (e.g. spark-shell, pyspark) are set up to calculate SPARK_HOME while Homebrew has set up JAVA_HOME for bin scripts.


I guess the question is if there are common scenarios for Homebrew's users that need SPARK_HOME defined. If so, we could add something like dotnet to clarify path:

def caveats
<<~TEXT
For other software to find dotnet you may need to set:
export DOTNET_ROOT="#{opt_libexec}"

The path you are looking for is SPARK_HOME=$(brew --prefix apache-spark)/libexec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
user configuration User configuration rather than a Homebrew issue
Projects
None yet
Development

No branches or pull requests

5 participants