On June 15, 2020, one of the biggest wireless outages in US history happened when a large portion of the T-Mobile network failed across the country. The system was down for 12 hours, with calls failing, including calls to 911. If you weren't a T-Mobile subscriber, you likely saw your friends posting on Facebook asking if anyone else was experiencing issues with their phones.
The investigation into the failure showed that the issue was caused by a cascade of failures. Unfortunately for T-Mobile, all of which have industry-standard practices to prevent, but they seem to have all been skipped. The company was installing new routers in the Southeast when a fiber transport link failed. If everything had been set up correctly, the network would have automatically switched to a secondary link and carried on - business as usual. However, things were not configured correctly, so the switchover never happened, isolating Atlanta's devices from the network.
As the Atlanta devices tried to re-assign to Wi-Fi or other nodes, they were blocked by a software issue that directed them back to their previous node, which was also isolated. The assignment process eventually got out of isolation, but the traffic cascaded, bringing more and more of the network to its knees. Voice over LTE (VoLTE) and Voice over Wi-Fi continued to fail nationwide, moving all voice traffic to the older smaller capacity 2G and 3G networks, causing over congestion and failed calls.
While the failure violated FCC guidelines and could have been prevented if the company had done what it was supposed to, the carrier has received absolutely no repercussions. It's not a huge surprise, as the FCC is notoriously lax on enforcement of rules. In fact, the FCC's only real response to the issue and the results of the investigation was a press release which just reminds all carriers to follow the rules. Hopefully, this will work to prevent these issues in the past.