BoxNet Data Upload

BoxNet is a deep convolutional neural network for particle picking pre-trained on a lot of hand-picked data. To make that lot even bigger and improve the picking accuracy, we rely on you, the Users, to contribute new training data.

 

How do I submit new data?

This user guide page will help you to prepare a TIFF stack for submission inside Warp. Once you’re done, use the form below to upload the file.

When selecting particles for general BoxNet training, please remember to not be too selective. Everything that is a protein counts, not just the most complete protein complexes. This may be different from the selection strategy you use to optimize BoxNet picking for your particular sample, where you might want to avoid particles below a certain size.

Can anyone use the data to scoop me?

Your data will be downscaled to 8 A/px and made part of BoxNet’s public repository. This means the best classes anyone can obtain from your data are limited to 16 A resolution.┬áThe description of your sample can be as general as you want, and you can remain anonymous.

If you think this would still reveal too much about your ongoing work, please use the form below to specify a future publication date. In that case, our curated version of BoxNet will still use it, but it won’t be part of the dataset everyone can download for local re-training until after the specified date.

I don’t use Warp. Can I still contribute?

Absolutely! The training data are stored as multi-layer TIFF files and adhere to the following conventions:

  • One TIFF per data set.
  • FP32 pixel format.
  • Micrographs are scaled to 8 A/px using Fourier-space padding/cropping.
  • 3 consecutive layers per micrograph, containing:
    • n * 3 + 0: The micrograph itself, normalized to mean = 0, std.dev. = 1, no additional filtering.
    • n * 3 + 1: Per-pixel labels, with 0 = background, 1 = particle, 2 = high-contrast artifact (ethane etc.). For each particle, a circle around its center is labeled as particle, with a diameter of 0.25 * particle, but at least 3 px.
    • n * 3 + 2: Per-pixel importance. The default is 1. Higher values will lead to higher importance in training, lower will lead to lower importance. Values over 5 and below 0.2 are discouraged.

We also encourage developers to integrate BoxNet in their SPA tools, or train novel network architectures using BoxNet’s extensive data set. It’s all free under GPLv3. To get started, take a look at the TensorFlow code, and its integration in Warp.

 

File upload