Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking incorrect MD5 Checksum errors from Copernicus Data Ecosystem #29

Open
sharkinsspatial opened this issue Nov 6, 2023 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@sharkinsspatial
Copy link
Collaborator

We have noted large numbers of incorrect MD5 checksums for Sentinel 2 products we are downloading.

As an example we downloaded the following product from the zipper endpoint 5a9eecf4-4114-46d9-a4a5-fcc023d895e4

And obtained its MD5 checksum from the OData endpoint.

The OData MD5 checksum is ba9fced9cf70360be62b5afed48395c5 while the downloaded file's actual MD5 checksum is 82ef414f463d293ab58c77c2e05d5f42

We have encountered this issue for 1000s of products over the past 2 weeks.

@freitagb has reached out directly via email to the ESA representative to try and elevate this issue but note that we will see very reduced Sentinel-2 download throughput and S30 production until this issue is addressed.

ref https://github.com/NASA-IMPACT/hls_development/issues/148

@ceholden
Copy link
Collaborator

ceholden commented Jan 15, 2025

We've encountered this issue again beginning on January 14, 2025 and so far lasting two days. The impact so far is delays and extra cost as the downloader was able to complete for yesterday's ingestion. For today's we've only been able to process ~2,000 of the ~6,000 granules available.

I spot checked to confirm that the MD5 we get from ESA is incorrect. I've noticed that the ModificationDate for these granules is just slightly more recent than the ChecksumDate they provide. If they modified the file after doing the checksum then that would be an obvious reason for the difference...

For example,

Click to see response for one of the granules we spot checked
https://catalogue.dataspace.copernicus.eu/odata/v1/Products?$filter=Id%20eq%20%27714aef16-b1f1-46cd-9fba-0e004adaa405%27

returns

{
  "@odata.context": "$metadata#Products",
  "value": [
    {
      "@odata.mediaContentType": "application/octet-stream",
      "Id": "714aef16-b1f1-46cd-9fba-0e004adaa405",
      "Name": "S2B_MSIL1C_20250113T171559_N0511_R112_T14SME_20250113T204513.SAFE",
      "ContentType": "application/octet-stream",
      "ContentLength": 72327444,
      "OriginDate": "2025-01-13T21:10:34.000000Z",
      "PublicationDate": "2025-01-13T21:17:45.531550Z",
      "ModificationDate": "2025-01-13T21:19:04.477954Z",
      "Online": true,
      "EvictionDate": "9999-12-31T23:59:59.999999Z",
      "S3Path": "/eodata/Sentinel-2/MSI/L1C/2025/01/13/S2B_MSIL1C_20250113T171559_N0511_R112_T14SME_20250113T204513.SAFE",
      "Checksum": [
        {
          "Value": "bcb0314f41c617b213d29d013e148e6d",
          "Algorithm": "MD5",
          "ChecksumDate": "2025-01-13T21:19:03.615561Z"
        },
        {
          "Value": "449ee789e0b1442b36ecf71b70476f3e8550a1f2190c7f2cc6b07bf7cc5ecc0f",
          "Algorithm": "BLAKE3",
          "ChecksumDate": "2025-01-13T21:19:03.740767Z"
        }
      ],
      "ContentDate": {
        "Start": "2025-01-13T17:15:59.024000Z",
        "End": "2025-01-13T17:15:59.024000Z"
      },
      "Footprint": "geography'SRID=4326;POLYGON ((-99.09942034046604 35.15399768603804, -98.89284079765551 35.1548426456455, -98.89183215851756 35.905410316101104, -98.90500756008365 35.857852899353226, -98.94602935520302 35.710335365114325, -98.98694516403154 35.562832740077, -99.02753927942368 35.415274223132414, -99.06803398620048 35.26768828282314, -99.09942034046604 35.15399768603804))'",
      "GeoFootprint": {
        "type": "Polygon",
        "coordinates": [
          [
            [
              -99.09942034046604,
              35.15399768603804
            ],
            [
              -98.89284079765551,
              35.1548426456455
            ],
            [
              -98.89183215851756,
              35.905410316101104
            ],
            [
              -98.90500756008365,
              35.857852899353226
            ],
            [
              -98.94602935520302,
              35.710335365114325
            ],
            [
              -98.98694516403154,
              35.562832740077
            ],
            [
              -99.02753927942368,
              35.415274223132414
            ],
            [
              -99.06803398620048,
              35.26768828282314
            ],
            [
              -99.09942034046604,
              35.15399768603804
            ]
          ]
        ]
      }
    }
  ]
}

EDIT / UPDATE - I checked again today for the example granule I used above (ID = 714aef16-b1f1-46cd-9fba-0e004adaa405) and the checksum was CORRECT. The metadata indicated that the granule was modified yet again, but this time the checksum was computed just after the modification date.

We have an email draft to a contact at ESA that we'll send out shortly. I added this example as part of a question related to why we're seeing incorrect checksums. I'm hoping that the things we've noted can help them resolve this issue, or at least they can provide us with some tips about what we should be doing to mitigate this issue.

For posterity the updated response from ESA's OData product API is,

Click to see updated response...
https://catalogue.dataspace.copernicus.eu/odata/v1/Products?$filter=Id%20eq%20%27714aef16-b1f1-46cd-9fba-0e004adaa405%27

returns

{
  "@odata.context": "$metadata#Products",
  "value": [
    {
      "@odata.mediaContentType": "application/octet-stream",
      "Id": "714aef16-b1f1-46cd-9fba-0e004adaa405",
      "Name": "S2B_MSIL1C_20250113T171559_N0511_R112_T14SME_20250113T204513.SAFE",
      "ContentType": "application/octet-stream",
      "ContentLength": 72327444,
      "OriginDate": "2025-01-13T21:10:34.000000Z",
      "PublicationDate": "2025-01-13T21:17:45.531550Z",
      "ModificationDate": "2025-01-16T14:10:05.486789Z",
      "Online": true,
      "EvictionDate": "9999-12-31T23:59:59.999999Z",
      "S3Path": "/eodata/Sentinel-2/MSI/L1C/2025/01/13/S2B_MSIL1C_20250113T171559_N0511_R112_T14SME_20250113T204513.SAFE",
      "Checksum": [
        {
          "Value": "5353d1722cb936e032edada9083019dd",
          "Algorithm": "MD5",
          "ChecksumDate": "2025-01-16T14:10:00.590540Z"
        },
        {
          "Value": "e0c6ef1937c18b52f1ef548ea240cc50f6aaf62065fc6d9111932b9ff6dee3bf",
          "Algorithm": "BLAKE3",
          "ChecksumDate": "2025-01-16T14:10:00.721388Z"
        }
      ],
      "ContentDate": {
        "Start": "2025-01-13T17:15:59.024000Z",
        "End": "2025-01-13T17:15:59.024000Z"
      },
      "Footprint": "geography'SRID=4326;POLYGON ((-99.09942034046604 35.15399768603804, -98.89284079765551 35.1548426456455, -98.89183215851756 35.905410316101104, -98.90500756008365 35.857852899353226, -98.94602935520302 35.710335365114325, -98.98694516403154 35.562832740077, -99.02753927942368 35.415274223132414, -99.06803398620048 35.26768828282314, -99.09942034046604 35.15399768603804))'",
      "GeoFootprint": {
        "type": "Polygon",
        "coordinates": [
          [
            [-99.099420340466, 35.153997686038],
            [-98.8928407976555, 35.1548426456455],
            [-98.8918321585176, 35.9054103161011],
            [-98.9050075600837, 35.8578528993532],
            [-98.946029355203, 35.7103353651143],
            [-98.9869451640315, 35.562832740077],
            [-99.0275392794237, 35.4152742231324],
            [-99.0680339862005, 35.2676882828231],
            [-99.099420340466, 35.153997686038]
          ]
        ]
      }
    }
  ]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants