Assessment - Grand Challenge

ASSESSMENT METHODS

Metrics

Runtime is a crucial parameter with regard to clinical applicability and shall be provided together with hardware requirements for all submissions.

Other parameters depend on the actual algorithm classes:

Task 3: Rupture risk

For the assessment of the rupture risk classification, we will calculate recall and precision. The ranking will be based on the F_2-Score.

The rupture risk of an aneurysm should not be overlooked. On the other hand, too many false positives mean a tedious screening for the physician, who has to review the risk assessment for decision making. The F2-score combines recall and precision such that the identification of aneurysms at risk is considered more important than the avoidance of false-positive risk classification.

During the submission test phase, submissions will be checked for completeness, and participants will be notified if cases are missing. During the validation phase, missing cases will be interpreted as algorithm failure (worst possible metric value in each category).

The metrics per submission will be shown separately in a table with accompanying boxplots so that it is possible to compare the algorithm performance per class separately and analyze biases. For each metric, we will analyze the coefficient of variation.