Behind the Scenes: Data Collection and Statistical Tools for NFT Bias Study | HackerNoon
Briefly

The article details a comprehensive methodology for collecting and analyzing NFT data from the OpenSea marketplace, focusing on gender-labeled collections. The researchers extracted metadata from a wide range of NFT collections using a detailed querying process and PySpark framework to retrieve and analyze transaction records. Approximately 2.5 million NFTs were examined, with a specific focus on collections that offer gender labels. The study aims to provide insights into the demographic representation within the NFT market, further exploring challenges related to race identification among NFTs.
Our dataset only includes NFTs transacted on OpenSea, which is the primary marketplace for NFTs on the Ethereum blockchain.
We retrieve collection metadata and each individual NFT's metadata and last sale price.
In the end, we obtain a dataset of about ∼ 2.5 NFTs, each that have been transacted upon.
To get gender-labelled data, we select collections that have metadata with the words 'male' and 'female' to find collections.
Read at Hackernoon
[
|
]