Jeff Ramsdale, November 12, 2020
Famously, Google implemented Zero Trust and documented it in a series of whitepapers under the banner of BeyondCorp. As they said in their third whitepaper: “One goal of BeyondCorp is to replace trust in the network with an appropriate level of trust in the device.” Establishing that trust requires new infrastructure and services, which requires investment and a high level of technical competence. Additionally, all technology projects of this magnitude must begin as organizational initiatives and depend on unwavering executive support. Having such backing allows planning to commence, beginning with an inventory of existing systems and user profiles, then establishing an orderly move from the old paradigm to the new while ensuring users continue to maintain uninterrupted access.
A move to Zero Trust either will fail or will prove to be painful if a thorough inventory of existing services isn’t captured during planning. As well, a full understanding of system to system dependencies and profiles of categories of users are also necessary. The former serves to prioritize the order in which systems should move out of the corporate network while the latter informs the order in which users can be weaned of their VPN use. A directed graph of service dependencies may identify “leaf” services that are clients of other services while not having clients of their own, suggesting an early move. Alternatively, a cluster of services may function as a highly coupled unit but with relatively few outgoing service connections and no incoming ones. Such a cluster may be a good candidate for a prioritized collective move to Zero Trust, while services with numerous dependencies may be better served by a later move.
While service dependencies tend to be static, users are altogether less predictable. Particularly in large organizations, however, users have a tendency to fall into profiles of activity, often by job function. That is, developers likely require access to code repositories, wikis, CI/CD pipelines, and deployment platforms, for example, while HR personnel may use an entirely different set of applications. Categorizing users according to the applications they use allows applications to be prioritized for movement to Zero Trust. Each cluster of users whose primary applications move to Zero Trust results in a corresponding reduction in dependence on a VPN for corporate network access.
Instrumenting both user workstations and applications not only is vital to building the initial inventory and determining an order of migration, but also for monitoring usage patterns once the migration begins. This telemetry data can help ensure mistakes aren’t made in prematurely moving services out of the corporate network or revoking user access. For instance, if the metrics, such as traffic logs, show that clients–either users or other services–are making requests from within the corporate network against a given in-network service, that service cannot yet move to the Zero Trust network. This is in keeping with the policy that connections from the new Zero Trust network to the old corporate network should be discouraged.
Having captured a service inventory and user profiles and established an order of precedence, the migration work can begin, typically organized into phases. It is suggested that, rather than attempting to make in-place changes to services and networks, a new Zero Trust virtual network be created with services then migrated to it. Services that remain in the “privileged” corporate network may make calls out to the Zero Trust network but if possible calls initiated outside the corporate network should be prohibited within it.
Efforts to reduce the size of the corporate network perimeter are particularly beneficial. For instance, if the corporate network extends from a cloud provider to one or more onsite locations, reducing the perimeter to strictly the cloud may be a positive step, even if it requires users to VPN in. The physical network access within a building, both wired and wifi, provide vectors for a bad actor to penetrate the defenses. Note that many services may continue to reside onsite, such as printers, physical security systems such as cameras and badge readers–even vending machines. The inventory taken above should have accounted for all such systems and ensure that they will continue to function once migrated out of the corporate network.
If Zero Trust were easy to implement most companies would likely already have done so. As such, once the necessary infrastructure is ready, applications themselves may still require rework in order to be compatible with the authentication, telemetry gathering, and other mechanisms necessitated by Zero Trust. Both for reasons of conforming with these requirements and to benefit from modern development and deployment practices, it may be an opportune moment to containerize applications, introduce a service mesh, deprecate out-of-date technologies, and/or perform other architectural upgrades. These should be in pursuit of simplification and to take advantage of the undifferentiated heavy lifting capability of the cloud and externalizing cross-cutting concerns from applications. As well, some technologies, such as NFS may be difficult or impossible to support in a Zero Trust model and will need to be replaced. Some applications, however, may be relatively easy to migrate. For instance, web applications may require little more than placement behind the access proxy introduced in the last post.
If the enhanced protection afforded by Zero Trust came at the cost of user experience one could imagine pushback from frustrated users. Fortunately, however, by dispensing with a VPN Zero Trust arguably provides a better user experience. Though ubiquitous 2-factor authentication is enforced, the use of a USB hardware security token can make the login process seamless. Buy-in from users can be a useful benefit as more and more applications migrate to Zero Trust and fewer require users to connect a VPN prior to their use. The eventual retirement of the VPN would therefore be an occasion to celebrate, rather than dread.
That said, as it could be disastrous to remove a user’s access to a given application they continue to require to perform their job function, it is vital to capture metrics on user activity to ensure VPN access is not revoked prematurely. Properly instrumented workstations and applications can ensure that each user’s application usage is tracked. If a given user accesses no applications in the corporate network for, say, a month, they could be sent an email warning that their VPN access will be revoked unless they receive a dispensation from a manager. Such exceptions should be discouraged, but only if successful migration of necessary applications has truly enabled them to function sans VPN. These exceptions should also expire, requiring renewal and thereby serving as bureaucratic motivation to minimize their use. With respect to new users, aiming for Zero Trust by default–that is, only providing VPN access to those who truly need it–should be considered a goal. Achieving this can serve as notable evidence of a Zero Trust migration’s ongoing success.
In the fourth and fifth of their five whitepapers on the subject of their Zero Trust migration, Google emphasized the need for users to be able to address their own technical issues through useful error messages that enable users to self-mitigate and with thorough, searchable, documentation and FAQs. Any question a user can answer on their own helps prevent overtaxing the limited resource that is support staff time. Capturing traffic logs and telemetry data in such a way as to be easily accessible for support purposes can also help in keeping productivity high. Since the migration effort is liable to have high user visibility, frequent communication is likely to be necessary. Automating as much of this communication as possible is vital to reducing the human cost of the effort. With careful planning and implementation, friction for users can be reduced to a minimum and staff can spend more time on value-adding migration efforts rather than costly support.
Migrating to Zero Trust is the culmination of a process that may take years to complete, involving planning, measuring, and implementing, while requiring a wide variety of skill sets, both technical and managerial. There is risk of user confusion, even pushback by those resistant to change. Missteps may require rollbacks and rethinking of the planned approach. Metrics, when analyzed, may necessitate reordering long-planned migration sequences. Despite these challenges, the prospect of a more secure network and enhanced usability can help motivate the substantial commitment Zero Trust requires.
Good candidates for moving to Zero Trust are those organizations that are able to implement an entirely new network architecture, build the authentication, proxy, and rules infrastructure to govern access, and roll out a carefully scripted migration over the course of years while continuing to service their existing customers without significant impact. However, for those unable or unwilling to commit to a full migration, the principles of Zero Trust are still valuable in guiding security considerations for their own software development and deployment, whether for internal or external customers. Even modest changes may reduce the surface area for malicious attacks.
The naive belief that a corporate network can be entirely secure has been thoroughly debunked. Trust should be earned by each client, not granted by default, and security ensured through mutual authentication and encryption.