Data Science Service: Health care use cases

Oracle Cloud Infrastructure Data Science (OCI) Data Science is a fully managed, serverless platform for data science teams to build, train, and manage machine learning models.

Data Science integrates with the rest of the OCI stack, including Oracle Functions, Data Flow, Autonomous Data Warehouse, and Object Storage. Oracle Accelerated Data Science (ADS) software developer kit (SDK) is a Python library that's included as part of the Data Science service, which has many functions and objects that automate or simplify the steps in the data science workflow, including connecting to data, exploring and visualizing data, training a model with AutoML, evaluating models, and explaining models. ADS also provides a simple interface to access the Data Science service model catalog and other OCI services, including Object Storage.

Architecture

This flexible architecture supports multiple scenarios across integrated health networks based on Oracle Machine Learning service, combining Autonomous Data Warehouse and Data Science platforms.

In addition to Data Science and Autonomous Data Warehouse, this architecture also uses Data Catalog, Oracle APEX Application Development, and Oracle Analytics Cloud. It also uses OCI Compute instances to host applications that can dynamically stream wearable device data to either Autonomous Data Warehouse or Object Storage. This architecture serves multiple purposes, including storing important data in secure, reliable, and quick-retrieval storage, and building and deploying the applications and machine learning modules in short periods of time.

The following diagram illustrates this reference architecture.

Description of architecture-datascience-use-cases.png follows
Description of the illustration architecture-datascience-use-cases.png

The architecture has the following components:

  • Region

    An Oracle Cloud Infrastructure region is a localized geographic area that contains one or more data centers, called availability domains. Regions are independent of other regions, and vast distances can separate them (across countries or even continents).

  • Availability domains

    Availability domains are standalone, independent data centers within a region. The physical resources in each availability domain are isolated from the resources in the other availability domains, which provides fault tolerance. Availability domains don’t share infrastructure such as power or cooling, or the internal availability domain network. So, a failure at one availability domain is unlikely to affect the other availability domains in the region.

  • Fault domains

    A fault domain is a grouping of hardware and infrastructure within an availability domain. Each availability domain has three fault domains with independent power and hardware. When you distribute resources across multiple fault domains, your applications can tolerate physical server failure, system maintenance, and power failures inside a fault domain.

  • Virtual cloud network (VCN) and subnets

    A VCN is a customizable, software-defined network that you set up in an Oracle Cloud Infrastructure region. Like traditional data center networks, VCNs give you complete control over your network environment. A VCN can have multiple non-overlapping CIDR blocks that you can change after you create the VCN. You can segment a VCN into subnets, which can be scoped to a region or to an availability domain. Each subnet consists of a contiguous range of addresses that don't overlap with the other subnets in the VCN. You can change the size of a subnet after creation. A subnet can be public or private.

  • Data Science service

    A fully managed, serverless platform for data science teams to build, train, and manage machine learning models. It can easily integrate with other OCI services such as Autonomous Data Warehouse, Object Storage, and more.

  • Autonomous Data Warehouse

    An Oracle autonomous database that includes Oracle Machine Learning. Data scientists can build, evaluate, score, and deploy machine learning models using in-database Oracle Machine Learning features and the related Notebooks interface. You can also use Autonomous Transaction Processing.

  • Application VM

    An OCI Compute instance with Oracle Linux installed and ready for installation of tools and applications that need access to the database.

  • Data Catalog

    OCI Data Catalog is a fully managed, self-service data discovery and governance solution for your enterprise data. Data Catalog provides a single collaborative environment to manage technical, business, and operational metadata.

  • Oracle Analytics Cloud

    Oracle Analytics Cloud empowers business analysts with modern, AI-powered, self-service analytics capabilities for data preparation, visualization, enterprise reporting, augmented analysis, and natural language processing and generation.

    Oracle Analytics Cloud is integrated with Oracle Machine Learning. This integration allows analysts to list available in-database models and use those models in Oracle Analytics Cloud analytics and dashboards.

  • APEX

    Oracle APEX Application Development is a low-code development platform that enables you to build scalable and secure enterprise applications that you can deploy anywhere. It's included with Autonomous Database and requires no installation. APEX users can access models and results from Oracle Machine Learning.

Recommendations

Your requirements might differ from the architecture described here. Use the following recommendations as a starting point.

  • VCN

    When you create a VCN, determine the number of CIDR blocks required and the size of each block based on the number of resources that you plan to attach to subnets in the VCN. Use CIDR blocks that are within the standard private IP address space.

    Select CIDR blocks that don't overlap with any other network (in Oracle Cloud Infrastructure, your on-premises data center, or another cloud provider) to which you intend to set up private connections.

    After you create a VCN, you can change, add, and remove its CIDR blocks.

    When you design the subnets, consider your traffic flow and security requirements. Attach all the resources within a specific tier or role to the same subnet, which can serve as a security boundary.

  • Security

    Use Oracle Cloud Guard to monitor and maintain the security of your resources in OCI proactively. Cloud Guard uses detector recipes that you can define to examine your resources for security weaknesses and to monitor operators and users for risky activities. When any misconfiguration or insecure activity is detected, Cloud Guard recommends corrective actions and assists with those actions, based on responder recipes that you can define.

    For resources that require maximum security, Oracle recommends that you use security zones. A security zone is a compartment associated with an Oracle-defined recipe of security policies that are based on best practices. For example, the resources in a security zone must not be accessible from the public internet and they must be encrypted using customer-managed keys. When you create and update resources in a security zone, OCI validates the operations against the policies in the security-zone recipe, and denies operations that violate any of the policies.

  • Autonomous Data Warehouse

    Create a separate schema for exclusive use by data scientists. Grant the schema read-only access to the main data warehouse schema. This arrangement allows data scientists to create local views of data for exploration, analysis, and model building. Where needed, shared data can be copied into their own schema where they can modify it locally.

  • Virtual Machines

    The VMs are distributed across multiple fault domains for high availability. We recommend using a flexible VM shape for the compute instance; this will allow you to increase or decrease the capacity of the VMs in minutes.

  • Object Storage

    Object Storage offers reliable and cost-efficient data durability, it provides quick access to large amounts of structured and unstructured data of any content type, including database data, analytic data, images, videos and more. We recommend using standard storage to ingest data from external sources because applications and users can access it quickly. You can build a lifecycle policy to move the data from standard storage to archive storage when it’s no longer required to be accessed frequently.

Considerations

Consider the following points when deploying this reference architecture.

  • Security

    Use policies to restrict who can access the OCI resources that your company has and how they can access them.

  • Application availability

    Fault domains provide the best resilience within a single availability domain. You can deploy Compute instances that perform the same tasks in multiple fault domains. This design removes a single point of failure by introducing redundancy.

  • Cost

    Evaluate your requirements to choose the appropriate Compute shapes.

  • Monitoring and alerts

    Set up monitoring and alerts on CPU and memory usage for your nodes so that you can scale the shape up or down as needed.

Deploy

The code required to deploy this reference architecture is available in GitHub. You can pull the code into Oracle Cloud Infrastructure Resource Manager with a single click, create the stack, and deploy it. Alternatively, download the code from GitHub to your computer, customize the code, and deploy the architecture by using the Terraform CLI.

  • Deploy by using Oracle Cloud Infrastructure Resource Manager:
    1. ClickDeploy to Oracle Cloud

      If you aren't already signed in, enter the tenancy and user credentials.

    2. Review and accept the terms and conditions.
    3. Select the region where you want to deploy the stack.
    4. Follow the on-screen prompts and instructions to create the stack.
    5. After creating the stack, click Terraform Actions, and select Plan.
    6. Wait for the job to be completed, and review the plan.

      To make any changes, return to the Stack Details page, click Edit Stack, and make the required changes. Then, run the Plan action again.

    7. If no further changes are necessary, return to the Stack Details page, click Terraform Actions, and select Apply.
  • Deploy using the Terraform code in GitHub:
    1. Go to GitHub.
    2. Clone or download the repository to your local computer.
    3. Follow the instructions in the README document.

More Information

To learn more about Oracle Cloud Infrastructure Data Science, see the following resources: