spark 처리중 다음과 같은 오류가 발생하는 경우가 있다.
아마도 shuffle 처리중 메모리가 부족해서 발생하는 것으로 추정된다.
이럴때는 spark.sql.shuffle.partitions 설정을 추가하면 된다.
다음 설정을 추가하여 처리하였다.
spark.sql.shuffle.partitions=300
spark.default.parallelism=300
diagnostics: User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 2.0 failed 4 times, most recent failure: Lost task 1.3 in stage 2.0 (TID 436: java.io.IOException: java.util.NoSuchElementException
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1283)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:174)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:65)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:65)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:89)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:72)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.NoSuchElementException
at org.apache.spark.util.collection.PrimitiveVector$$anon$1.next(PrimitiveVector.scala:58)
at org.apache.spark.storage.memory.PartiallyUnrolledIterator.next(MemoryStore.scala:697)
at org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:30)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$2.apply(TorrentBroadcast.scala:178)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$2.apply(TorrentBroadcast.scala:178)
at scala.Option.map(Option.scala:146)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:178)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1276)
'빅데이터 > spark' 카테고리의 다른 글
[spark] 스파크 디플로이 모드(deploy mode)의 cluster, client 의 차이 (0) | 2018.03.27 |
---|---|
[spark] 스파크 컨테이너 메모리 오류시 처리 방안 (0) | 2018.02.13 |
스파크에서 문자열 utf-8 형식으로 처리하기 (0) | 2016.06.20 |
[개념] 스톰 vs 스파크 (0) | 2015.07.21 |
[spark] 아파치 스파크 (0) | 2015.06.24 |