Deploy a Highly Scalable GraphQL Solution

This Reference Architecture demonstrates how you can deploy a GraphQL-based solution that can be easily scaled to meet workload demand.

GraphQL, an open-source data query and manipulation language for APIs, and a runtime for fulfilling queries with existing data, is an ideal component for such implementations as mobile applications that explicitly define required attributes. It provides APIs where a client may want to request data across related entities to optimize the request and response flow, for example minimizing mobile network traffic for mobile applications. GraphQL also offers an effective means to help abstract from API consumers how back-end solutions can be implemented by providing a single schema that may cover many different back-end services.

Architecture

This architecture defines different routes for static and dynamic API content. This approach makes it easier to optimize access to the static content by configuring caching options at several layers, including by using a Content Delivery Network (CDN), at the load balancer, and within the backend services that load and serve the content.

Backend services are implemented with microservices in this architecture. These services could also deploy a CMS solution, such as Oracle Content and Experience, through open source CMS options. As the content is static, the security risks are considerably lower.

API content is routed through an API gateway so that the API requests can be validated and controlled (for example, rate limited). The traffic is then sent into the Oracle Kubernetes Engine load balancer, the point of ingress into the cluster, which directs it to a GraphQL server. Ideally, if microservices are used to serve the static content, the services supporting GraphQL and the static content are held in separate namespaces for enhanced isolation and control.

This implementation adopts the open-source Apollo GraphQL server to receive the invocations and decompose the work to separate microservices hosting the resolver and mutator logic. You can scale the implementation more efficiently by implementing the different subdomains within the data model using separate services. Therefore, it is easier to tune the solution with any in-memory caching.

The following diagram illustrates this reference architecture.

Description of deploy-hs-graphql.png follows
Description of the illustration deploy-hs-graphql.png

deploy-hs-graphql.zip

The code and documentation to implement the architecture, including example services, can be found at the relevant GitHub repository (see the Deploy topic, below). The API route uses an Apollo GraphQL Server and Python services for each subdomain. More detail is included in the GitHub documentation provided.

This architecture has the following components:

Tenancy
A Tenancy covers all regions being used. For maximized performance and resilience you wish to replicate the deployment in several different regions around the world and combine public DNS routing so clients go to the nearest region to minimize latency. This would require backend data to be replicated between regions.
Region
An Oracle Cloud Infrastructure region is a localized geographic area that contains one or more data centers, called availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).
Compartment
Compartments are cross-region logical partitions within an Oracle Cloud Infrastructure tenancy. Use compartments to organize your resources in Oracle Cloud, control access to the resources, and set usage quotas. To control access to the resources in a given compartment, you define policies that specify who can access the resources and what actions they can perform. Compartment controls for this architecture could be added to separate the public access layer and the backend to minimize the risk of accidentally creating direct paths to the security layer.
Availability domains
Availability domains are standalone, independent data centers within a region. The physical resources in each availability domain are isolated from the resources in the other availability domains, which provides fault tolerance. Availability domains don't share infrastructures such as power or cooling, or the internal availability domain network. So, a failure at one availability domain is unlikely to affect the other availability domains in the region. In this reference architecture, Kubernetes worker nodes are distributed across both fault domains and availability domains to ensure maximum resilience.
Fault domains
A fault domain is a grouping of hardware and infrastructure within an availability domain. Each availability domain has three fault domains with independent power and hardware. When you distribute resources across multiple fault domains, your applications can tolerate physical server failure, system maintenance, and power failures inside a fault domain.
Virtual cloud network (VCN) and subnets
A VCN is a customizable, software-defined network that you set up in an Oracle Cloud Infrastructure region. Like traditional data center networks, VCNs give you complete control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.
Load balancer
The Oracle Cloud Infrastructure Load Balancing service provides automated traffic distribution from a single entry point to multiple servers in the back end. In this reference architecture, the load balancer will include routing policies to separate traffic being directed to the API gateway for dynamic data and the static content (for example images, web pages, etc). This can then exploit the Web Application Acceleration (WAA) capabilities of the Load Balancer.
Security list
For each subnet, you can create security rules that specify the source, destination, and type of traffic that must be allowed in and out of the subnet.
NAT gateway
The NAT gateway enables private resources in a VCN to access hosts on the internet, without exposing those resources to incoming internet connections.
Service gateway
The service gateway provides access from a VCN to other services, such as Oracle Cloud Infrastructure Object Storage. The traffic from the VCN to the Oracle service travels over the Oracle network fabric and never traverses the internet.
Cloud Guard
You can use Oracle Cloud Guard to monitor and maintain the security of your resources in Oracle Cloud Infrastructure. Cloud Guard uses detector recipes that you can define to examine your resources for security weaknesses and to monitor operators and users for risky activities. When any misconfiguration or insecure activity is detected, Cloud Guard recommends corrective actions and assists with taking those actions, based on responder recipes that you can define.
Security zone
Security zones ensure Oracle's security best practices from the start by enforcing policies such as encrypting data and preventing public access to networks for an entire compartment. A security zone is associated with a compartment of the same name and includes security zone policies or a "recipe" that applies to the compartment and its sub-compartments. You can't add or move a standard compartment to a security zone compartment.
Autonomous database
Oracle Cloud Infrastructure autonomous databases are fully managed, preconfigured database environments that you can use for transaction processing and data warehousing workloads. You do not need to configure or manage any hardware or install any software. Oracle Cloud Infrastructure handles creating the database, as well as backing up, patching, upgrading, and tuning the database.
Container Engine for Kubernetes
Oracle Cloud Infrastructure Container Engine for Kubernetes is a fully managed, scalable, and highly available service that you can use to deploy your containerized applications to the cloud. You specify the compute resources that your applications require, and Container Engine for Kubernetes provisions them on Oracle Cloud Infrastructure in an existing tenancy. Container Engine for Kubernetes uses Kubernetes to automate the deployment, scaling, and management of containerized applications across clusters of hosts. The Kubernetes cluster will have nodes allocated to different vault zones and availability zones to maximize resilience and availability. The Kubernetes cluster will ideally have Istio or another Service Mesh capability to manage and monitor the microservices. The GraphQL service will exist in its own pod, with the various resolvers and mutators deployed into the Kubernetes cluster as their own microservices.
Registry
Oracle Cloud Infrastructure Registry is an Oracle-managed registry that enables you to simplify your development-to-production workflow. The registry makes it easy for you to store, share, and manage development artifacts, like Docker images. The highly available and scalable architecture of Oracle Cloud Infrastructure ensures that you can deploy and manage your applications reliably.

Recommendations

Use the following recommendations as a starting point. Your requirements might differ from the architecture described here.

VCN
When you create a VCN, determine the number of CIDR blocks required and the size of each block based on the number of resources that you plan to attach to subnets in the VCN. Use CIDR blocks that are within the standard private IP address space.

Select CIDR blocks that don't overlap with any other network (in Oracle Cloud Infrastructure, your on-premises data center, or another cloud provider) to which you intend to set up private connections.

After you create a VCN, you can change, add, and remove its CIDR blocks.

When you design the subnets, consider your traffic flow and security requirements. Attach all the resources within a specific tier or role to the same subnet, which can serve as a security boundary.
Security
Use Oracle Cloud Guard to monitor and maintain the security of your resources in Oracle Cloud Infrastructure proactively. Cloud Guard uses detector recipes that you can define to examine your resources for security weaknesses and to monitor operators and users for risky activities. When any misconfiguration or insecure activity is detected, Cloud Guard recommends corrective actions and assists with taking those actions, based on responder recipes that you can define. For resources that require maximum security, Oracle recommends that you use security zones. A security zone is a compartment associated with an Oracle-defined recipe of security policies that are based on best practices. For example, the resources in a security zone must not be accessible from the public internet and they must be encrypted using customer-managed keys. When you create and update resources in a security zone, Oracle Cloud Infrastructure validates the operations against the policies in the security-zone recipe, and denies operations that violate any of the policies.
Cloud Guard
Clone and customize the default recipes provided by Oracle to create custom detector and responder recipes. These recipes enable you to specify what type of security violations generate a warning and what actions are allowed to be performed on them. For example, you might want to detect Object Storage buckets that have visibility set to public. Apply Cloud Guard at the tenancy level to cover the broadest scope and to reduce the administrative burden of maintaining multiple configurations. You can also use the Managed List feature to apply certain configurations to detectors.
Network security groups (NSGs)
You can use NSGs to define a set of ingress and egress rules that apply to specific VNICs. We recommend using NSGs rather than security lists because NSGs enable you to separate the VCN's subnet architecture from the security requirements of your application.

You can use NSGs to define a set of ingress and egress rules that apply to specific VNICs. We recommend using NSGs rather than security lists because NSGs enable you to separate the VCN's subnet architecture from the security requirements of your application.
Load balancer bandwidth
While creating the load balancer, you can either select a predefined shape that provides a fixed bandwidth or specify a custom (flexible) shape where you set a bandwidth range and let the service scale the bandwidth automatically, based on traffic patterns. With either approach, you can change the shape at any time after creating the load balancer.
API Gateway
The API Gateway can be used to provide an initial level of screening and usage controls such as:
- Service Authentication & Authorization
- Service controls such as rate-limiting
- Capture of service use analytics
In addition, the API Gateway (not the firewall, or load balancer) should perform solution-aware routing so any endpoints not being satisfied by the GraphQL capabilities can be directed to the correct location. Given this, consideration should be given to reasonable rate limits, based on both the maximum performance capability supported by the backend solutions as well as peak entitlement of any one service user.

Considerations

Consider the following points when deploying this reference architecture.

Performance
This architecture will provide a means to build out capacity for performance both vertically and horizontally. Even with the most optimized microservices, there is a lag time of bringing on new nodes, so this needs to be allowed when handling scaling either manually or dynamically. This will be particularly true for the persistence layer of any solution.
Security
You should address application-level security at the API Gateway. However, you can address fine-grained GraphQL specific security (for example, attribute level access) by utilizing GraphQL directives such as @auth.
Availability
- Kubernetes
  Fault domains provide resiliency within an availability domain. You can configure Kubernetes worker nodes to deploy across different availability zones.
- API Gateway, Load balancers, etc.
  Managed services, such as the API Gateway, have their availability handled within a region. You can configure load balancers to ensure coordination between availability zones.
You can further enhance availability, if required, by adopting a multi-region deployment model, although this does increase complexity and coordination.
Cost
The greater the availability and resilience required, the greater the operational cost. This is because the required volume of redundant compute resources grows. Consider which implementation of GraphQL you will use and what, if any, licensing constraints are imposed and whether the implementation is supported by the provider.

Deploy

Deploy this architecture manually by following the instructions in the Oracle DevRel Github repository. Loading container images into the Registry and deploying Kubernetes is a manual process or you can do so by following the process described in the implementation detail included within the GitHub documentation.

Explore More

To learn more about GraphQL, see the following resources:

Acknowledgments

Author: Phil Wilkins