Mastercard Directory Server service degradation
Incident Report for 3dsecure.io
Postmortem

We have been in contact with Mastercard and have been able to construct a timeline for the incident.

Mastercard is currently establishing a new datacenter for EMV 3DS transactions. On 2022-08-09, Mastercard conducted a test where 3-D Secure v2 Directory Server traffic was switched to the new datacenter. The test ran for 10 minutes before the traffic was switched back. The switch was done using DNS.

The loadbalancer of the old datacenter is configured to rewrite an empty path in HTTP requests to the path expected by the directory servers behind the load balancer. This configuration was missing on the load balancer within the new datacenter. As a result all Authentication Requests (AReqs) towards the new datacenter resulted in HTTP status code 404 response.

Incident timeline

At 2022-08-09T19:44Z, our monitoring systems detected a stark increase in HTTP 404 responses towards Mastercard. The error rate fluctuated between 30-100%. Our operations team inspected our service and was unable to detect any network or service abnormalities originating from our systems. On the contrary, all services were able to make AReqs toward Mastercards directory servers, and the 404 error responses were evenly distributed across all services.

At this point, we were expecting a service disruption announcement from Mastercard.

At 2022-08-10T09:06Z, we escalated the issue with Mastercard.

At 2022-08-10T16:37Z, our case was forwarded to the team responsible for directory servers. This started an investigation that found and added the missing configuration to the new load balancer.

At 2022-08-10T17:42Z, Mastercard reached out and asked us to refresh all network communication cache. Our operations team terminated all established TLS connections and scaled up the number of instances to absorb any potential increase in traffic resulting from the services becoming available again.

At 2022-08-10T18:56Z, the HTTP status code 404 error rate towards Mastercards directory server had dropped to zero to bring 3-D Secure v2 for Mastercard back to normal operations.

At this point, Mastercard had not provided any of the incident details. We therefore kept our systems in a heightened monitoring state while awaiting further answers.

At 2022-08-12T11:59Z we marked the issue as resolved.

Constructing a timeline of events in collaboration with Mastercard has been ongoing ever since the incident started.

Lessons learned

The incident has shown some needed improvements to our systems, and work has begun on implementing these.

One of the improvements is improved logging of outgoing requests towards directory servers. We would like to be able to detect any change in directory servers, instead of needing confirmation from a third party. This will improve our response time and enable us to resolve incidents faster.

Another improvement is reworking the connection policy for the service responsible for AReqs, including faster termination of TLS connections and faster DNS rediscovery.

The incident has also shown that the current policy for incident escalation was inadequate.

In conjunction with Mastercard we will improve the communication channel between the 3Dsecure.io and the Mastercard Identity Check team.

We are currently working on updating our incident response plan where the lessons learned will be incorporated and rounded off by a fire drill exercise.

Sincerely,

your operations team

Posted Aug 23, 2022 - 13:34 UTC

Resolved
Next update will be on Tuesday 2022-08-23 at 13:00 UTC (15:00 CEST) at the latest
------
We have been closely monitoring our systems for the last 2 days and are comfortable with resolving the incident.
It has not been possible to reproduce the error in any of our manual tests.
At present, we are waiting on feedback from Mastercard to clarify exactly what caused the error.

We have escalated the incident with Mastercard. Once receiving answers we will construct a post-mortem which will be posted here.
Posted Aug 12, 2022 - 11:59 UTC
Update
We have been monitoring the systems for the last approximately 14 hours and the error rate is down to the level to be expected during normal operations. Communication regarding this issue is ongoing with Mastercard. We will continue monitoring the systems.
Posted Aug 11, 2022 - 12:42 UTC
Monitoring
In collaboration with Mastercard, it has been determined that unstable DNS resolution and an old decommissioned Mastercard endpoint was at the root of the problem.

At 2022-08-10T19:56:00Z the DNS issue was resolved and the error rate decreased to near zero percent.
We are still in dialogue with Mastercard about how to avoid similar issues in the future.
Posted Aug 10, 2022 - 20:43 UTC
Update
We are in dialogue with Mastercard and waiting for an update.
Posted Aug 10, 2022 - 14:07 UTC
Investigating
This incident has been escalated with Mastercard.
We are awaiting an update. We expect to update this incident within the next 2 hours.
Posted Aug 10, 2022 - 09:36 UTC
Monitoring
Since approximately 2022-08-09T20:40:00Z we have seen a large amount of invalid responses from Mastercard's 3-D Secure v2 directory server.

We have contacted Mastercard, however, we cannot say anything about a time frame for a resolution.
Posted Aug 10, 2022 - 08:16 UTC
This incident affected: 3DSv2 (3DSS).