Keeping the infrastructure of the country safe and secure is a non-negotiable need, but these same systems are constantly being targeted by cyber criminals’ intent on disrupting operations. Though the United States electricity grid is considered to be safeguarded and reliable, no system is impermeable. While the grid regularly faces system failures and effects of natural disasters one threat that the United States electricity sector has yet to face is a large-scale cyber-attack that has catastrophic consequences. This is good news. But at the same time, cyber-attacks on all industries are becoming more frequent and the threat to the energy sector is no different. In the first three months of 2018, there was a 32 percent increase in cyber- attacks on US industries from the previous year. Within the first six months of 2019, over 4 million data breaches occurred.
While all cyber-attacks are of concern, the unique concerns for the electricity sector lie in the potential for a large-scale attack where multiple utility companies are hit simultaneously, or an attack on a critical utility company where there are compounding effects on others that cause a domino-like impact across the sector. The result of either of these types of attacks would be a crisis for the impacted utility, but in addition, there would potentially prolonged outages or other damages since there might be insufficient resources available to assist in recovery and returning to normal operations.
- This hypothetical case study was prepared by Wellesley student Sarah Chou, MIT student Michael Sapienza and MIT CAMS Executive Director Dr. Keri Pearlson for discussion and teaching. Any resemblance to any real organization is purely accidental. The authors would like to thank Hans Olsen, Mike Steinmetz and several other reviewers who asked to remain anonymous. Thank you to contributors Jess Smith from Pacific Northwest National Laboratory, Scott Baker, Jonathon Monken and Steve McElwee from PJM Interconnection, and Jake Schmitter from Electricity Information Sharing and Analysis Center (EISAC) and a number of other contributors who also asked to remain anonymous. This material is based upon work supported by the Department of Energy under Award Number DE-OE0000780. This report was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.
Cyber threats increase due to the evolution of technology. Newer developments such as the internet of things (IoT), the cloud, smart grid tech, and IT/OT convergence are intended to increase efficiency and replace outdated systems, but they also introduce gateways for possible attacks. IoT specifically is quickly becoming more vulnerable, with nearly 3 billion cyber-attacks recorded as of September 2019. When everything is connected to a network, even one that is ‘air gapped’ from the Internet, it becomes easier to bring down the whole system with a well- placed cyber-attack. Further, as innovations arrive at an increasing pace, it also becomes even more difficult for managers to adapt operations and response plans to keep pace with dynamic changes.
For many, it’s not a matter of ‘if a major attack will happen’ but instead “when,” and how to prepare for the “inevitable” as MIT Professor Stuart Madnick says. Hackers are becoming smarter, constantly finding new ways to launch attacks, and defending against the unknown is difficult, if not impossible. Just follow the press, where many examples of countries such as Ukraine and Saudi Arabia faced cyber-attacks on industrial control systems in the energy delivery ecosystem. Should that happen in the U.S., the real effects of a cyber-attack on the energy delivery sector can range from momentary power outages to catastrophic physical infrastructure damage. The ramifications of a cyber-attack also reach further than the effects to the grid itself, threatening other critical infrastructure ecosystems such as the transportation industry, the water supply, and the economy as a whole.
Consider this hypothetical case study and the potential wide-reaching ramifications. On an average day in July where the electricity grid was functioning normally, areas of Massachusetts and Rhode Island began experiencing power outages. It was quickly identified that the areas affected belonged to two companies that serviced consumers across the two states: a transmission company Accelerated Grid and distribution company Light for All. Together, the two companies were responsible for power in parts of the Greater Boston Area and Providence, and most of southeast Massachusetts. The cyber-attack resulted in damage to a step-down substation run by Accelerated Grid, which ultimately affected the connected Light for All distribution lines. Within a few minutes, 1 million people were experiencing blackouts, and the number was growing. A short while later, states in the Midwest, namely in Illinois and Indiana also began experiencing power outages. Only one company, Connected Utilities, was affected in this region, though their transmission lines were connected to their own distribution substations responsible for bringing power to residents in cities such as Bloomington and Effington. Around 150,000 people lost power in that region. (Figure 1)
The companies initially believed the outages were a result of a system failure and began the normal detect and recover processes. It took five hours to diagnose the cyber-attack. Given that cyber- attacks are not constrained by geographical boundaries, the threat actors were able to target these two completely different highly populated geographical areas without much effort.
Prior to this, neither the southern Midwest nor eastern Massachusetts and Rhode Island had ever experienced major disturbances to the grid, since like most utility companies, their local suppliers had taken a proactive role to ensuring that the grid was as strong and stable as possible. The companies contracted with vendors to make sure their software was up to date and they had already begun to phase out old and vulnerable equipment. In addition, they also had strong, consistent network monitoring and cybersecurity processes. In the event of an anomaly in their network, retainers with third-party cybersecurity platform vendors kicked in. The companies involved believed they were protected against an event like this up until now.
All three companies had response plans that outlined responsibilities for real time operators to restore power. The employees also went through training that taught them what to do during an emergency situation. As transmission companies, Accelerated Grid and Connected Utilities were subject to the NERC CIP standards, which describe standards for transmission companies that include aspects of required training, incident reporting, and vulnerability assessments, among other things. Part of the standards requires companies to train employees on items such as identification and recovery of cyber incidents, all intended to prepare for an attack on critical infrastructure. Because CIP standards do not specifically state the required preparation metrics, leaders at the companies felt they were prepared for a cyberattack, but in reality, they had never performed a physical response drill. Their preparation had been tabletop exercises or communications exercises which did not simulate real time effects of a cyberattack, such as loss of communications, limited access to in-house expertise, or response to physical damage. In addition, they had not practiced recovery with their vendors, and while they knew who to call, they did not have additional plans in place should their standard vendors be unavailable or unable to assist. On the other hand, Light for All as a distribution company falls under the jurisdiction of state governments, which focuses on ensuring that the state offers aid to companies in need both in its resilience plans and response efforts. However, all states have different standards and there is inconsistency in the way they are organized and presented.
BACKGROUND ON THE UTILITY COMPANIES
Like most utility companies, Accelerated Grid had in-house teams responsible for spearheading response efforts when emergencies occur. On-site operators were familiar with all aspects of their substations and received training on how to manually fix all parts of the system when something failed. They also had a team of industrial control experts to handle damage or disruptions to substations. Accelerated Grid had recently hired two cyber experts that specialized in diagnosing and analyzing cyber problems to find the root cause. As a larger company responsible for supplying power to parts of major cities including Boston and Providence, Accelerated Grid also had sophisticated network monitoring capabilities, which allowed them to recognize issues relating to server connections, end point failures and other potential network issues. The company also had trained their teams with simulated cyber- attacks on their system, to ensure that the experts were familiar with the available resources.
One initiative in the cyber plans for Accelerated Grid was the Cyber Mutual Assistance (CMA) program, the Electricity Subsector Coordinating Council (ESCC)’s program dedicated to aid in the event of a cyber emergency. Accelerated Grid expected CMA to send personnel and equipment in the event of a cyber emergency, as their network of experts offered specialized knowledge on these critical issues that Accelerated Grid employees did not have. However, CMA faced similar limitations in resource availability as other vendors. Transportation and quantity of energy to affected utilities was available but in a limited manner and for a short time. Long term or widely impacted geographical attacks would severely tax the CMA support. CMA is also limited due to the nature of cyber-attacks; lack of geographical boundaries leaves nearly every company vulnerable to the threat, and many may not voluntarily lend their resources in case they may get hacked in the near future. They are supposed to seek CMA help when their in-house cyber employees have exhausted their knowledge in the problem at hand, though there is no objective time in which they should call.
As a larger company, Accelerated Grid maintained a direct relationship with the Department of Homeland Security (DHS). The relationship included an agreement that in the event of a situation where it was needed, DHS would send a fly away team to aid in restoring power. In their plan, employees at Accelerated Grid worked with the National Cybersecurity & Communications Integration Center (NCCIC) and the Director of the Hunt and Incident Response Team (HIRT) in the event that all other internal operations to restore power failed, or if the situation became too severe for the in-house teams to handle.
Light for All
Though Light for All’s distribution system is connected to Accelerated Grid, the two were separate entities and did not share resources when it came to response mechanisms. Light for All’s response plan however, also included on-site operator training on repairs to the substation and specific parts of the transformer systems. However, their response plans stated that they needed to call Stronger for You, their industrial control systems vendor, for help with repairs in the event of a blackout. Light for All did not have any in-house cyber employees. Instead they had a relationship with a CV1, a cyber vendor who was on contract to identify, respond and assist in recovery from a cyber-attack. However, their plans did not clearly state when the employees should call the cyber vendor for help. Managers used CV1 primarily for issues with data breaches or hacks in their corporate office. While their contract had provisions for assistance in the event of an attack on their industrial control systems, CV1 expertise in industrial control systems was limited. CV1’s Boston office had 10 employees on call to assist customers with cyber-attacks, including Light for All.
Connected Utilities’ response plan for emergencies relied on external help, given the limited amount of resources and personnel trained to handle blackouts and potential cyber- attacks. Like the other two companies, operators at Connected Utilities had the responsibility to fix broken systems and took training for emergency situations. But their plans were not mature or very detailed about response scenarios for situations such as the one they were facing. Operators were not real clear on when to call their ICS vendor, Illinois Electric Systems. Unlike Light for All’s plan that required them to immediately call when a blackout occurs, Connected Utilities’ response plan was ambiguous about waiting for the vendor to start repairs versus trying to fix the system while they wait. It was also not clear what level of damage was needed in order to call for help. On the cyber front, Connected Utilities did not have any in-house cyber experts trained to analyze and remove malware to get systems cleaned up and working properly again. Instead, they maintained a relationship with a CV1’s Chicago office, who came every month or so to examine their systems. Their response plan did not clearly indicate situations when operators should call CV1 for assistance. In fact, Connected Utilities did not have mechanisms to definitely determine when a cyber-attack was occurring. As a smaller entity serving suburban and rural areas of Indiana and Illinois, Connected Utilities was in the process of establishing ties with state and local governments to create paths for assistance and support, but had struggled to finalize these relationships due to the government prioritizing companies with larger and more populated service areas before Connected Utilities. When they finally did call CV1, they faced a similar problem Light for All did, where the knowledge of industrial control systems was limited, and they did not have enough employees to assist the company. (Figure 2)
A Malware Attack Occurred
During the weeks prior to the outage, all three companies had, ironically, been working with third party cybersecurity and operational technology (OT) vendors to upgrade the security of their systems. Following normal procedures, upgraded firewalls, new authentication processes and stronger virus protection systems were installed by the cybersecurity vendors, and OT vendors used remote access to the ICS operations to upgrade their systems and insure they were not impacted by the new, increased security. Remote access was a common way for OT vendors to assess the company’s networks and processors, perform maintenance, and run diagnostics remotely.
Unfortunately, a newly created malware, Indestructor, was injected into the energy companies using the same remote access system that the OT vendor used to manage the ICS. The actor (a hacker group) was able to utilize an exploit for the remote access system and install a backdoor to give them maintained access and bypass encryption and authentication measures that were otherwise needed to use these systems. The hacker group actually targeted these three companies due to a similarity in their OT system. They first hacked the vendor to identify where the new OT system was going, and identified Accelerated Grid, Light for All, and Connected Utilities as their targets. They then created malware to attack all three. The hackers were able to access the systems quickly through the backdoors and established a connection between the hacker’s external server and the energy company’s internal server. They installed the malware, which was capable of taking advantage of the DNP3 protocols used by the infrastructure to control the OT systems and create a new communication process that connected the hackers directly to the utility systems. Indestructor was then able to issue direct commands to a Remote Terminal Unit (RTU) in the grid, which led to a triggered opening and closing of the circuit breakers and caused substations to de- energize, and re-energize, affecting the balance of power in the grid. The malware was eventually able to shut down multiple substations in the transmission system. (Figure 3)
Step-down substations lower the voltage so that distribution systems could deliver energy to homes and local buildings. In Massachusetts and Rhode Island, Accelerated Grid’s substations worked in conjunction with the distribution system run by Light for All, making up a substantial portion of the grid. With the cascading effects of the malware on the transmission substations, over 1 million people in Massachusetts and Rhode Island were left without power. The malware also disabled any self-correcting or automatic mechanisms that could have normally helped to restart the systems. Data from the utility company’s computers were deleted. Furthermore, a Connected Utilities substation also experienced major physical damage which was reported when smoke began appearing from one transformer, likely due to the increased pressure from the erratic behavior of the circuit breakers. (Figure 4)
Immediately following the outage, local businesses and other infrastructure operations began facing serious complications as well. Critical institutions such as hospitals and government buildings had backup generators, but other industries that affected the greater public such as transportation, begin to suffer. In the Northeast, public transportation came to a momentary halt; the subway on the outskirts of Boston were delayed several hours, then came back with limited services using backup generators. Bus service was also impacted as traffic lights and signals switched to back up generators. The outage resulted in prolonged traffic, people having to walk places, and a surge in prices for ride share services. While there are no underground trains in the areas affected in Illinois and Indiana, buses experienced similar delays, leaving numerous people behind their daily schedules.
Another area of concern was the water industry, given that electricity is needed for water plants that deliver clean and safe water to homes. Furthermore, even though backup generators exist in hospitals and other critical buildings, the water comes from outside sources and must be pumped through a system that both sanitizes and delivers water. The disruption in the energy delivery system created a problem for sanitation. Since electricity was cut off in both the Northeast and in the Midwest, millions of people were left without drinkable water. Plumbing also becomes a major concern, and people are forced to find different sources of water in order to use restrooms, showers, sinks and other systems.
Furthermore, the outage also cut phone lines, and those without cell phones were unable to make calls. While longer term concerns included charging batteries for cell phones, those who were able to send messages or make calls were often unable to connect to their target person. Cell towers had backup generators but concerns about how long that would last also arose. This made it difficult for authorities to deliver critical information to people in a timely manner. There was no immediate information released regarding road closures or public transportation delays as well as possible accidents leaving people confused and anxious about safety. Updates on power restoration for the public were made, but how far they reached was uncertain.
With response plans in place, the real time operators on site were instructed to attempt to restore power themselves. However, with the physical damage that occurred, the operators on scene struggled to do so. They were still under the impression that the damage came as a result of too much demand, or a natural cause, much like that of the Northeast Blackout of 2003, where transmission lines sagged into overgrown trees and caused a multiple day outage across the Northeast and Midwest of the country. Within a few hours, Accelerated Grid, Light for All, and Connected Utilities quickly exhausted their initial response mechanisms and managers knew that they need further assistance in order to recover. Each company had a different approach to their response and recovery.
Accelerated Grid quickly received calls from customers about the blackout and reported that nothing out of the ordinary, such as a tree damaging a transmission line, occurred. Industrial control system experts from the company examined the issue. They were unable to trace the cause of the outage and or identify how the substations shut down so quickly. The employees at Accelerated Grid considered the list of possible causes, with cyber threat as an increasing possibility because all data from their computers had been wiped out. The in-house cyber employees were unable to get to the central location quickly. After five hours, their in- house employees arrived and completed their analysis of the systems. They were unable to isolate the malware. The company had not included a response for a data-wiping attack in their response plans and had no current backups for their control systems to help restore power. Their response plans indicated that they would reach out to CMA if something like this occurred, and CMA was contacted. CMA attempted to provide services but was unable to provide enough power to cover the entire outage area, citing distance, transportation, and concern about contaminating other systems with the malware as issues preventing a full-scale alternative energy source. As their response plan stated, the employees also decided to call the DHS fly-away team, who ensured that they would be given assistance as soon as possible.
Light for All faced different circumstances than Accelerated Grid, as their systems were not directly infected by the malware. Since Light for All’s distribution substations were directly connected to Accelerated Grid’s transmission lines, they experienced an outage in delivery capability but had to wait for the transmission system to be fixed. In the meantime, their response plan for outages required them to reach out to their contracted industrial control experts at Stronger for You. The experts analyzed the effects of the malware on their part of the grid and found that no damage to the substations or lines had occurred. Their customers were without power, but Light for All had no options for using alternative sources to recover.
In the Midwest, the situation was also different. Since Connected Utilities had limited in- house resources, they called their contractors. Since one substation that fed major distribution centers experienced major physical damage, the company’s in-house operators were unsure of their next steps. The fire department had been called and the smoldering substation was no longer on fire. The operators attempted to get the power up and running, but due to the recent upgrades in their OT systems, the company did not have a backup transformer readily available. They had a plan to invest in one in the near future, but that was not going to help fix this emergency. They went through the steps outlined in their business continuity plans for restoring power but found that they are unable fix the failed systems. The employees began to consider other root causes for the failure in their control systems. After nearly six hours of analysis and forensics, they heard about the outage in the Northeast where a cyber-attack may possibly have been the cause. They contacted the Chicago office of CV1 for assistance, only to learn that it would be another five hours before they could reach the site, due to added traffic from the outage. (Figure 5)
Transition to Recovery
After a few days, all three companies were able to restore power delivery, and the companies and the grid as a whole were on the road to recovery. Indestructor, the malware, wiped out most of its own footprint, but cyber vendors were able to analyze its effects and study how it was able to take down a major portion of the grid. Officials were called in, and the cause of the outage was released to the general public to assist others should similar issues be noticed.
However, other long-term consequences of the attack were creating new concerns for the energy delivery companies. Top on the list were physical damage and future vulnerabilities. Executives at the three affected companies held emergency executive team meetings to identify resources necessary to avoid similar issues in the future and focused on the steps necessary to ensure that their systems were safe and protected. Working with their OT vendors, the ICS team and their cybersecurity peers removed the backdoor, and installed new authentication measures, patches, and software to decrease the system’s vulnerability. Supply chain vendors were given new passwords and procedures for accessing ICS systems. To further prevent a reoccurrence in the future, executives from the energy delivery companies created an organization to make it easier to share information with each other about outages and recovery mechanisms to protect others from an event such as this one.
The other industries impacted by this cyber-attack also had a difficult time recovering. The transportation across the four different states resumed normal services after three days, but the wake of their outage caused citizens and local governments to reexamine their own recovery and response mechanisms. Following this incident, the city transportation departments started brainstorming ways for their services to be more resilient. They planned on installing additional backup generators for power and established relationships with energy specialists should they need additional help. The water industry faced greater complications; without sanitary water, they were forced to create additional partnerships with nearby towns and states to get drinkable water to their citizens, which would be costly. It took a week for normal operations to resume and for the water plants to function properly, as there was heightened fear that the outage may have induced further damage. The communications sector gradually recovered as power was restored area by area. Telecom and cellular service companies also made plans to have additional backup power supplies and larger companies in the areas of the outages considered plans for alternative telecommunications should their landlines and cell phones not work during an emergency situation.
Luckily, few people were injured as a result of the power outage. Since hospitals had strong backup generators, critical patients were successfully cared for, and other urgent cases that occurred were diverted to hospitals that could accommodate them.
While the local and state government buildings did not lose power, officials began thinking about damage to critical infrastructure and how to aid the utility companies next time. Topics of discussion ranged from establishing closer relationships with critical infrastructure companies, changing standards for response mechanisms, local coordinating boards to assist in emergency situations, review of regulations that might assist or inhibit response and recovery, and additional planning and response processes for officials.
In the meantime, the ESCC decided to convene a meeting to discuss the Indestructor event, given that it was the first large-scale cyber-attack on an industrial control system in energy delivery subsector of the U.S. Even though the utility companies responded to and recovered from the attack, the ESCC believed that the process could have been more efficient and effective in practice. What could the ESCC do to assist utility companies, so this type of malware attack did not happen again? How could the ESCC assist utility companies of all sizes and capabilities with their cybersecurity response and recovery plans? Indestructor was an attack that impacted two different utility companies in geographically different regions at the same time. Resources were spread thin. Some expected resources were unable to assist in this situation, given the broad geographic impact. The ESCC begins to evaluate the likelihood that this would occur again.
Their meeting led them to questions regarding response mechanisms in mitigating the attack that extend beyond a single, or even two companies. Since another large-scale attack could happen again, the ESCC focused on the ecosystem’s response as a whole. They sought to answer one question: How can all companies be prepared for a cyber-attack? They decided that the response plans needed to include more detail, but what was the most appropriate mitigation plan?