Background
本文为阅读Spark The Definitive Guide Chapter 12所做的归纳与整理
Definition
- Immutable, Partitioned Collection Of Records
- No Concept Of Rows in RDD, individual Records are just Java/Scala/Python Objects. There are no schema in RDDs.
- All the code in spark compiles down to RDD
- Spark’s Structured API automatically store data in an optimized, compressed binary format while you need to implement this format inside your objects manually