We have been in contact with Mastercard and have been able to construct a timeline for the incident.
Mastercard is currently establishing a new datacenter for EMV 3DS transactions. On 2022-08-09, Mastercard conducted a test where 3-D Secure v2 Directory Server traffic was switched to the new datacenter. The test ran for 10 minutes before the traffic was switched back. The switch was done using DNS.
The loadbalancer of the old datacenter is configured to rewrite an empty path in HTTP requests to the path expected by the directory servers behind the load balancer. This configuration was missing on the load balancer within the new datacenter. As a result all Authentication Requests (AReqs) towards the new datacenter resulted in HTTP status code 404 response.
At 2022-08-09T19:44Z, our monitoring systems detected a stark increase in HTTP 404 responses towards Mastercard. The error rate fluctuated between 30-100%. Our operations team inspected our service and was unable to detect any network or service abnormalities originating from our systems. On the contrary, all services were able to make AReqs toward Mastercards directory servers, and the 404 error responses were evenly distributed across all services.
At this point, we were expecting a service disruption announcement from Mastercard.
At 2022-08-10T09:06Z, we escalated the issue with Mastercard.
At 2022-08-10T16:37Z, our case was forwarded to the team responsible for directory servers. This started an investigation that found and added the missing configuration to the new load balancer.
At 2022-08-10T17:42Z, Mastercard reached out and asked us to refresh all network communication cache. Our operations team terminated all established TLS connections and scaled up the number of instances to absorb any potential increase in traffic resulting from the services becoming available again.
At 2022-08-10T18:56Z, the HTTP status code 404 error rate towards Mastercards directory server had dropped to zero to bring 3-D Secure v2 for Mastercard back to normal operations.
At this point, Mastercard had not provided any of the incident details. We therefore kept our systems in a heightened monitoring state while awaiting further answers.
At 2022-08-12T11:59Z we marked the issue as resolved.
Constructing a timeline of events in collaboration with Mastercard has been ongoing ever since the incident started.
The incident has shown some needed improvements to our systems, and work has begun on implementing these.
One of the improvements is improved logging of outgoing requests towards directory servers. We would like to be able to detect any change in directory servers, instead of needing confirmation from a third party. This will improve our response time and enable us to resolve incidents faster.
Another improvement is reworking the connection policy for the service responsible for AReqs, including faster termination of TLS connections and faster DNS rediscovery.
The incident has also shown that the current policy for incident escalation was inadequate.
In conjunction with Mastercard we will improve the communication channel between the 3Dsecure.io and the Mastercard Identity Check team.
We are currently working on updating our incident response plan where the lessons learned will be incorporated and rounded off by a fire drill exercise.
Sincerely,
your operations team