Hello,
I'm trying to run the following method in a databricks scala notebook:
val tweetsSentimentList = tweetsSentimentsRdd.collect()
after runnig this mehod:
val tweetsSentimentsRdd = sc.textFile("/FileStore/tables/tweets-30.04.21.txt").filter(removeHttpLines).map(x=>requestFormatter(x)).map(y=>sendPostRequest("myUrl","myKey",y))
after runnig these three methods:
`def requestFormatter(givenTweet:String):String={
s"""{
"documents":[
{
"language":"en",
"id":1,
"text":"${givenTweet}"
}
]
}"""
}
def sendPostRequest(textAnalyticsUrl:String,subscriptionKey:String,requestBody:String):String={
import scalaj.http.Http
Thread.sleep(3000)
val result = Http(textAnalyticsUrl).postData(requestBody)
.header("Content-Type", "application/json")
.header("Ocp-Apim-Subscription-Key", subscriptionKey).asString
result.body
}
def removeHttpLines(textLine:String):Boolean={
import scala.util.matching.Regex
val pattern = "^http".r
pattern.findFirstIn(textLine) match {
case Some(x)=>false
case _ => true
}
}
`
I get this error:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 2, localhost, executor driver): java.lang.NoClassDefFoundError: Could not initialize class scalaj.http.DefaultConnectFunc$
i have installed on my cluster the spark maven package scalaj-http_2.12-2.3.0 because using a newer verion i got a version error expected 5, found 5.2