File analysis on upload

When uploading files to your project, we can extract file data that you can use to your benefit:

  • Basic file information, e.g. EXIF.
  • Insights into its content.
  • Optional file analyses (e.g. virus checking or object recognition).

You can use this data to validate and moderate incoming files.

File and content info

Right after the file is received by us, we can return to you information about the file and its content. We also look inside it to identify the correct MIME type.

API response example:

"datetime_removed": null,
"datetime_stored": "2018-11-26T12:49:10.477888Z",
"datetime_uploaded": "2018-11-26T12:49:09.945335Z",
"is_image": true,
"is_ready": true,
"mime_type": "image/jpeg",
"original_file_url": "https://ucarecdn.com/22240276-2f06-41f8-9411-755c8ce926ed/pineapple.jpg",
"original_filename": "pineapple.jpg",
"size": 642,
"url": "https://api.uploadcare.com/files/22240276-2f06-41f8-9411-755c8ce926ed/",
"uuid": "22240276-2f06-41f8-9411-755c8ce926ed",
"variations": null,
"content_info": {
  "mime": {
    "mime": "image/jpeg",
    "type": "image",
    "subtype": "jpeg"
  },
  "image": {
    "format": "JPEG",
    "width": 500,
    "height": 500,
    "sequence": false,
    "orientation": 6,
    "geo_location": {
      "latitude": 55.62013611111111,
      "longitude": 37.66299166666666
    },
    "datetime_original": "2018-08-20T08:59:50",
    "dpi": [
      72,
      72
    ]
  }
},

Optional data analysis

Currently it includes:

API response example:

"appdata": {
    "uc_clamav_virus_scan": {
      "data": {
        "infected": true,
        "infected_with": "Win.Test.EICAR_HDB-1"
      },
      "version": "0.104.2",
      "datetime_created": "2021-09-21T11:24:33.159663Z",
      "datetime_updated": "2021-09-21T11:24:33.159663Z"
    },
    "remove_bg": {
      "data": {
        "foreground_type": "person"
      },
      "version": "1.0",
      "datetime_created": "2021-07-25T12:24:33.159663Z",
      "datetime_updated": "2021-07-25T12:24:33.159663Z"
    },
    "aws_rekognition_detect_labels": {
      "data": {
        "LabelModelVersion": "2.0",
        "Labels": [
          {
            "Confidence": 93.41645812988281,
            "Instances": [],
            "Name": "Home Decor",
            "Parents": []
          },
          {
            "Confidence": 70.75951385498047,
            "Instances": [],
            "Name": "Linen",
            "Parents": [
              {
                "Name": "Home Decor"
              }
            ]
          },
          {
            "Confidence": 64.7123794555664,
            "Instances": [],
            "Name": "Sunlight",
            "Parents": []
          },
          {
            "Confidence": 56.264793395996094,
            "Instances": [],
            "Name": "Flare",
            "Parents": [
              {
                "Name": "Light"
              }
            ]
          },
          {
            "Confidence": 50.47153854370117,
            "Instances": [],
            "Name": "Tree",
            "Parents": [
              {
                "Name": "Plant"
              }
            ]
          }
        ]
      },
      "version": "2016-06-27",
      "datetime_created": "2021-09-21T11:25:31.259763Z",
      "datetime_updated": "2021-09-21T11:27:33.359763Z"
    }
  }

Perceptual hash

File info response includes a value of a perceptual hash calculated using pixel contents of an image. Perceptual hashing is a common fingerprinting technique for quickly comparing images and finding duplicates or similar images.

Uploadcare automatically calculates a 64-bit long perceptual hash value and returns it as a HEX string. In this example, the perceptual hash value is 940f5fd09aa48ddc:

{
  "id": "1b192edb-212d-401a-ad9b-529047272e1b",
  "datetime_original":null,
  "orientation":null,
  "height":1600,
  "width":2400,
  "geo_location":null,
  "format":"JPEG",
  "hash":"940f5fd09aa48ddc"
}

It’s easy to find image duplicates by quickly comparing their perceptual hash values. To find similar images, it's important to compare perceptual hash values bitwise (Hamming distance). A small number of unmatched bits (e.g., up to 8 bits) will correspond to subtle changes in the visual contents, while non-similar images will usually have more than 8 different bits.