2017-05-26 31 views
6

Scala archetype ile maven kullanıyorum. O hatayı alıyorum: Zaten pom.xml'Bu bazı şeyleri eklemek çalıştı

“value $ is not a member of StringContext”

, ama hiçbir şey çok iyi çalıştı ...

Kodum:

import org.apache.spark.ml.evaluation.RegressionEvaluator 
import org.apache.spark.ml.regression.LinearRegression 
import org.apache.spark.ml.tuning.{ParamGridBuilder, TrainValidationSplit} 
// To see less warnings 
import org.apache.log4j._ 
Logger.getLogger("org").setLevel(Level.ERROR) 


// Start a simple Spark Session 
import org.apache.spark.sql.SparkSession 
val spark = SparkSession.builder().getOrCreate() 

// Prepare training and test data. 
val data = spark.read.option("header","true").option("inferSchema","true").format("csv").load("USA_Housing.csv") 

// Check out the Data 
data.printSchema() 

// See an example of what the data looks like 
// by printing out a Row 
val colnames = data.columns 
val firstrow = data.head(1)(0) 
println("\n") 
println("Example Data Row") 
for(ind <- Range(1,colnames.length)){ 
    println(colnames(ind)) 
    println(firstrow(ind)) 
    println("\n") 
} 

//////////////////////////////////////////////////// 
//// Setting Up DataFrame for Machine Learning //// 
////////////////////////////////////////////////// 

// A few things we need to do before Spark can accept the data! 
// It needs to be in the form of two columns 
// ("label","features") 

// This will allow us to join multiple feature columns 
// into a single column of an array of feautre values 
import org.apache.spark.ml.feature.VectorAssembler 
import org.apache.spark.ml.linalg.Vectors 

// Rename Price to label column for naming convention. 
// Grab only numerical columns from the data 
val df = data.select(data("Price").as("label"),$"Avg Area Income",$"Avg Area House Age",$"Avg Area Number of Rooms",$"Area Population") 

// An assembler converts the input values to a vector 
// A vector is what the ML algorithm reads to train a model 

// Set the input columns from which we are supposed to read the values 
// Set the name of the column where the vector will be stored 
val assembler = new VectorAssembler().setInputCols(Array("Avg Area Income","Avg Area House Age","Avg Area Number of Rooms","Area Population")).setOutputCol("features") 

// Use the assembler to transform our DataFrame to the two columns 
val output = assembler.transform(df).select($"label",$"features") 


// Create a Linear Regression Model object 
val lr = new LinearRegression() 

// Fit the model to the data 

// Note: Later we will see why we should split 
// the data first, but for now we will fit to all the data. 
val lrModel = lr.fit(output) 

// Print the coefficients and intercept for linear regression 
println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}") 

// Summarize the model over the training set and print out some metrics! 
// Explore this in the spark-shell for more methods to call 
val trainingSummary = lrModel.summary 

println(s"numIterations: ${trainingSummary.totalIterations}") 
println(s"objectiveHistory: ${trainingSummary.objectiveHistory.toList}") 

trainingSummary.residuals.show() 

println(s"RMSE: ${trainingSummary.rootMeanSquaredError}") 
println(s"MSE: ${trainingSummary.meanSquaredError}") 
println(s"r2: ${trainingSummary.r2}") 

ve benim pom.xml şudur:

<project xmlns="http://maven.apache.org/POM/4.0.0" 

xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> 
    <modelVersion>4.0.0</modelVersion> 
    <groupId>test</groupId> 
    <artifactId>outrotest</artifactId> 
    <version>1.0-SNAPSHOT</version> 
    <name>${project.artifactId}</name> 
    <description>My wonderfull scala app</description> 
    <inceptionYear>2015</inceptionYear> 
    <licenses> 
    <license> 
     <name>My License</name> 
     <url>http://....</url> 
     <distribution>repo</distribution> 
    </license> 
    </licenses> 

    <properties> 
    <maven.compiler.source>1.6</maven.compiler.source> 
    <maven.compiler.target>1.6</maven.compiler.target> 
    <encoding>UTF-8</encoding> 
    <scala.version>2.11.5</scala.version> 
    <scala.compat.version>2.11</scala.compat.version> 
    </properties> 

    <dependencies> 
    <dependency> 
     <groupId>org.scala-lang</groupId> 
     <artifactId>scala-library</artifactId> 
     <version>${scala.version}</version> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-mllib_2.11</artifactId> 
     <version>2.0.1</version> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-core_2.11</artifactId> 
     <version>2.0.1</version> 
    </dependency> 
    <dependency> 
     <groupId>org.apache.spark</groupId> 
     <artifactId>spark-sql_2.11</artifactId> 
     <version>2.0.2</version> 
    </dependency> 
    <dependency> 
     <groupId>com.databricks</groupId> 
     <artifactId>spark-csv_2.11</artifactId> 
     <version>1.5.0</version> 
    </dependency> 

    <!-- Test --> 
    <dependency> 
     <groupId>junit</groupId> 
     <artifactId>junit</artifactId> 
     <version>4.11</version> 
     <scope>test</scope> 
    </dependency> 
    <dependency> 
     <groupId>org.specs2</groupId> 
     <artifactId>specs2-junit_${scala.compat.version}</artifactId> 
     <version>2.4.16</version> 
     <scope>test</scope> 
    </dependency> 
    <dependency> 
     <groupId>org.specs2</groupId> 
     <artifactId>specs2-core_${scala.compat.version}</artifactId> 
     <version>2.4.16</version> 
     <scope>test</scope> 
    </dependency> 
    <dependency> 
     <groupId>org.scalatest</groupId> 
     <artifactId>scalatest_${scala.compat.version}</artifactId> 
     <version>2.2.4</version> 
     <scope>test</scope> 
    </dependency> 
    </dependencies> 

    <build> 
    <sourceDirectory>src/main/scala</sourceDirectory> 
    <testSourceDirectory>src/test/scala</testSourceDirectory> 
    <plugins> 
     <plugin> 
     <!-- see http://davidb.github.com/scala-maven-plugin --> 
     <groupId>net.alchim31.maven</groupId> 
     <artifactId>scala-maven-plugin</artifactId> 
     <version>3.2.0</version> 
     <executions> 
      <execution> 
      <goals> 
       <goal>compile</goal> 
       <goal>testCompile</goal> 
      </goals> 
      <configuration> 
       <args> 
       <!--<arg>-make:transitive</arg>--> 
       <arg>-dependencyfile</arg> 
       <arg>${project.build.directory}/.scala_dependencies</arg> 
       </args> 
      </configuration> 
      </execution> 
     </executions> 
     </plugin> 
     <plugin> 
     <groupId>org.apache.maven.plugins</groupId> 
     <artifactId>maven-surefire-plugin</artifactId> 
     <version>2.18.1</version> 
     <configuration> 
      <useFile>false</useFile> 
      <disableXmlReport>true</disableXmlReport> 
      <!-- If you have classpath issue like NoDefClassError,... --> 
      <!-- useManifestOnlyJar>false</useManifestOnlyJar --> 
      <includes> 
      <include>**/*Test.*</include> 
      <include>**/*Suite.*</include> 
      </includes> 
     </configuration> 
     </plugin> 
    </plugins> 
    </build> 
</project> 

Nasıl düzeltileceğine dair bir fikrim yok. Herhangi bir fikri olan var mı?

import org.apache.spark.sql.functions.col 

Sonra $"column"

col("column") için Umut değiştirin:

+1

import import "import sqlContext.implicits._"? –

+0

evet, ama işe yaramıyor. Aynı hatayla devam ediyor: “value $, StringContext'in bir üyesi değil” – Thaise

+0

Spark-csv'yi pom.xml dosyasından kaldırmanız gerekecek çünkü bu bir çalışma zamanı hatasına neden olacaktır. – eliasah

cevap

17

yerine sadece bu gibi içe col işlevini kullanabilirsiniz bu .. o

val spark = SparkSession.builder().getOrCreate()  
import spark.implicits._ // << add this 
+0

Hey @Thaise, yanıtı tavsiye edileni olarak işaretleyin –

5

çalışacak ekle

0

@ A'ya yardımcı olur. Ben SparkSession den spark.implicits._ yerine getOrCreate

import df.sparkSession.implicits._ 
ile elde birinin dataframe başvurulan aktararak bir iş çevresinde bulunan Purva cevabı başlangıçta hata IntelliJ kayboldu benim için çalıştı ama sonra sbt compile faz

sırasında "Could not find implicit value for spark" sonuçlandı df olduğu

bir DataFrame

benim kod implicit val spark: SparkSession parametresi alınan bir case class içine yerleştirilmiş olmasından kaynaklanıyor olabilir; ama bu düzeltmenin neden benim için çalıştığından emin değilim.