[spark] AWS-EMR에서 스파크와 카프카 연동에서 발생한 Caused by: java.lang.ClassNotFoundException: kafka.DefaultSource 오류 해결 방법

티스토리 뷰

빅데이터/spark

[spark] AWS-EMR에서 스파크와 카프카 연동에서 발생한 Caused by: java.lang.ClassNotFoundException: kafka.DefaultSource 오류 해결 방법

hs_seo 2019. 10. 14. 15:19

AWS EMR에서 스파크와 카프카를 연동할 때 EMR은 카프카 라이브러리를 기본적으로 제공하지 않기 때문에 다음과 같은 오류가 발생합니다.

scala> val df = spark.readStream.format("kafka").option("kafka.bootstrap.servers","localhost:9092").option("subscribe","test").load()
java.lang.ClassNotFoundException: Failed to find data source: kafka. Please find packages at https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects
  at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:148)
  at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:79)
  at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:79)
  at org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:218)
  at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:80)
  at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:80)
  at org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:30)
  at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:124)
  ... 48 elided
Caused by: java.lang.ClassNotFoundException: kafka.DefaultSource
  at scala.reflect.internal.util.AbstractFileClassLoader.findClass(AbstractFileClassLoader.scala:62)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:132)
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5$$anonfun$apply$1.apply(DataSource.scala:132)
  at scala.util.Try$.apply(Try.scala:192)
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:132)
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$5.apply(DataSource.scala:132)
  at scala.util.Try.orElse(Try.scala:84)
  at org.apache.spark.sql.execution.datasources.DataSource.lookupDataSource(DataSource.scala:132)
  ... 55 more

scala>

메이븐에서 스파크 버전에 맞는 카프카 라이브러리를 다운로드 해서 추가해 주면 됩니다. 아래의 위치에서 필요한 파일을 다운로드 받아서 스파크 쉘을 실행할 때 jar 파일을 추가해 주면 됩니다.

해결방법

스파크 쉘을 실행할 때 kafka jar파일을 추가하면 됩니다.

$ spark-shell --jars spark-sql-kafka-0-10_2.11-2.0.2.jar

https://mvnrepository.com/artifact/org.apache.spark/spark-sql-kafka-0-10_2.11

Maven Repository: org.apache.spark » spark-sql-kafka-0-10

Kafka 0.10+ Source For Structured Streaming VersionScalaRepositoryUsagesDate2.4.x2.4.42.11Central1Aug, 20192.4.32.11Central19May, 20192.4.22.11Central2Apr, 20192.4.12.11Central1Apr, 20192.4.02.11Central5Oct, 20182.3.x2.3.42.11Central 0 Sep, 20192.3.32.11Ce

mvnrepository.com

저작자표시 비영리 동일조건

'빅데이터 > spark' 카테고리의 다른 글

[spark] Unable to instantiate SparkSession with Hive support because Hive classes are not found. 오류 해결 방법 (0)	2019.11.08
[spark] AWS EMR에서 Caused by: java.lang.ClassNotFoundException: org.apache.kafka.common.serialization.ByteArrayDeserializer 오류 발생시 해결 방법 (0)	2019.10.14
[spark] <console>:23: error: overloaded method value option with alternatives: 오류 (0)	2019.10.14
[spark-dataframe] 데이터 프레임에 새로운 칼럼 추가 (0)	2019.08.08
[spark] MLib 라이브러리 (0)	2019.04.11

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

글 보관함

개발자로 살아남기

티스토리 뷰