Meet the Skin Condition Image Network (SCIN) dataset, a new collaboration between Google and Stanford Medicine.
SCIN was designed to reflect the broad range of concerns that people search for online and supplement the types of conditions typically found in clinical datasets.
It contains images across various skin tones and body parts to help ensure that future AI tools work effectively for all, writes Pooja Rao, Research Scientist, Google Research, in a blog post.
The SCIN dataset is available as an open-access resource for researchers, educators, and developers, and developers have taken careful steps to protect contributor privacy.
The SCIN dataset currently contains more than 10,000 images of skin, nail, or hair conditions, directly contributed by individuals experiencing them. All contributions were made voluntarily with informed consent by individuals in the US, under an institutional-review board approved study.
Contributors were asked to take images both close-up and from slightly further away and were given the option to self-report demographic information and tanning propensity (self-reported Fitzpatrick Skin Type, i.e., sFST), and to describe the texture, duration and symptoms related to their concern.
One to three dermatologists labeled each contribution with up to five dermatology conditions, along with a confidence score for each label. The SCIN dataset contains these individual labels, as well as an aggregated and weighted differential diagnosis derived from them that could be useful for model testing or training, Rao writes. Researchers created the SCIN dataset via a novel crowdsourcing method described in an accompanying research paper.