Describe the enhancement requested
Optimization reduces redundant getFileStatus() RPC calls to the NameNode
when processing Parquet files.
Benefits:
- Reduces NameNode RPC calls by 1 per Parquet file during split generation
- For workloads with thousands of small files, this can significantly reduce
NameNode pressure and improve job startup time
- Maintains full backward compatibility with existing code paths
Component(s)
Core