The Real Data Corpus

The Real Data Corpus (RDC) is a collection of raw data extracted from data-carrying devices that were purchased on the secondary market around the world. Many studies have shown that hard drives, cell phones, USB memory sticks, and other data-carrying devices are frequently discarded by their original users without the data first being cleared or purged. By purchasing these devices and extracting their data, we have created a data set that closely mimics data as it is found in the real world.

  • A total of 156 hard drive images ranging in size from 500MB to 80GB.
  • Approximately 600 flash memory images (USB, Sony Memory Stick, SD and other), ranging from 128MB to 4GB.
  • Approximately 100 CDs, all purchased outside the US.
  • Approximately 10 digital camera memory images.
  • Approximately 40 GSM SIM chip memory images.