The Organizer will provide a database of all transaction history retrieved from Ethereum (ETH) blockchain from its genesis block to the last block in 2020, referred to as the Training Database. In addition, a list of known trusted ETH wallet addresses and a list of known scam ETH wallet addresses based on public records on the Internet will also be aggregated by the Organizer from the Training Database, referred to as Training Labels. Please note that the list of trusted addresses and the list of scam addresses are not complete. The Organizer will provide reasonable effort to ensure high True Positive and high True Negative rates. One challenge of the competition is that the vast majority of the ETH addresses do not have groundtruth labels to be guaranteed trusted or scam.
Participants of the 0x competition may train their classifiers to provide an estimated label of a query ETH address to be between 0 and 1, 0 being no risk and 1 being high risk. The classifiers can be of any method, as long as the submission code is written in Python. The ranking of all participating solutions will be determined by two batches of testing data:
- ETH Test: A random period of ETH transaction history post 2020 will be used to form a part of the Test Benchmark
- Wild Card Test: Another undisclosed blockchain will also be used as the second part of the Test Benchmark to validate the accuracy of submitted solutions.
For the Test Benchmark, all real blockchain addresses will be anonymized but a one-to-one correspondence will be ensured. To discourage any participant to “game” the system, we strongly discourage any submission to try to overfit their algorithm using ETH onchain data post 2020, which is also the chief reason that the Organizer is proposing a Wild Card Test based on another undisclosed blockchain and its transaction history. Furthermore, to win the more challenging Wild Card Test, submission solutions are advised to discount using signature transaction features specifically associated with the ETH blockchain, such as its gas fee values.
The release of the Training Database will include a static database file about 500GB, sample scripts to efficiently retrieve transaction data in Python, and an installation README file.
Important Dates:
- Training Database Release: TBD
- Competition Submission Deadline: TBD