Posts

Showing posts from February, 2018

Spark Memory Model

Image
Spark is a distributed data processing framework. It works best with huge amount of data (What we call as Bigdata) .As we are dealing with enormous amount of data , it is very important to understand the Memory model of the framework , which will give you a better flexibility to process data . Here we will consider Spark is running  on top of YARN resource manager . Spark has mainly two components where we are concerned about memory as below Driver Executor   Driver     Driver is the place where all the local computations happens . Some times it is required to collect too much data to driver from executors or some times it is required to do a heavy local computations. Hence this kind of memory intensive activities might slowdown or break your application . Driver has two memory division i.e Diver overhead and driver memory. Driver over head is the amount of heap memory (in megabytes) . This  memory accounts for JVM overheads, interned s...