#data-skew

[ follow ]
Data science
fromHackernoon
2 years ago

Deep Dive into MS MARCO Web Search: Unpacking Dataset Characteristics | HackerNoon

The MS MARCO dataset reveals considerable multilingual disparity and significant data skew, highlighting challenges in model evaluation and training.
Data science
fromMedium
2 months ago

Apache Spark: Fix data skew issue using salting technique (practical example)

Data skew leads to performance issues in Spark when certain keys dominate the distribution during shuffles.
Salting can effectively reduce data skew by distributing heavy keys across multiple partitions.
[ Load more ]