Cloud Based Scientific Data Management: The Benefits of Integrating Your Diverse Research

By Anthony Uzzo

The “Cloud” for R&D Experimental Data Capture and Information Management

Modern R&D organizations are rapidly adopting cloud-based IT infrastructures as they externalize a growing segment of operations spanning product research and development. This is particularly evident in the life sciences (pharmaceutical and biotech) industries where companies are moving from centralized, corporate-based facilities to a virtual network of contract research, development, and manufacturing organizations (CRDMOs), and/or academic institutions and corporate partners to enable product discovery and development. The migration to an external research ecosystem has been driven by the economic realities of R&D over the last decade and the total cost of ownership (TCO) advantages of cloud-based IT infrastructures over traditional models.

This is especially true where R&D processes are data intensive and central to discovery and intellectual property (IP) creation. Operational efficiencies are realized when data is consistently captured, analyzed, and reported on over the discovery continuum from internal sources and external CRDMOs. Moreover, new breakthroughs in contemporary instruments and natural sciences (e.g., Genomics) has created an array of big data challenges that require organizations to continuously adapt IT data management practices in support of the integration with these new technologies and modern sequencing instrumentation. As a result, IT and Informatics professionals are charged with an important challenge: to supplant antiquated legacy systems with flexible, scalable and accessible data management solutions that unify how organizations capture and share information across sites.

Software providers possessing commercial cloud-based infrastructures and services are poised to meet the modern evolving needs of the life science industry as this externalized, collaborative IT paradigm shift matures. While cloud infrastructure providers like Amazon Web Services provide a strong foundation to deal with the volume and velocity of big data; what is required to effectively manage an externalized research operation is highly flexible data management platform with the capability to deal with a variety of research and development data (Cell Based Assays, ADME, Toxicology, PK, Animal Studies, Gene Expression, Proteomics, Next Generation Sequencing) while providing collaboration tools that enable remote access across a range of devices for employees, and limited/restricted access for partners (ie. CRDMOs). Recent ‰platform-based‰ IT architecture initiatives have made cloud-based informatics a reality for many organizations.

Big Data – Variety is the Real Challenge

One of the most cited reasons driving cloud adoption by organizations is the scale and performance attributed to elastic computing infrastructures such as those available in Amazon‰’s Elastic Compute Cloud (EC2). This value of IaaS is validated by a 57% growth rate that will take the market past $10B in 2016. Scalable Cloud Infrastructures go a long way to addressing the volume and velocity challenges associated with big data. However, this infrastructure does not enable externalized research and development collaboration alone. To deal with the variety of R&D big data, organizations must invest in a flexible, cloud based Platform as a Service (PaaS).

A Platform as s Service provides a solution stack as a service, empowering customers with the ability to create or download collections of applications to extend and tailor the base functionality of the software to meet their unique requirements. As a result, PaaS vendors have already proved to be major disruptive forces in many other enterprise software markets including CRM, ERP and HRM. To deal with the challenges associated with the externalization of R&D, the Scientific Data Management software market requires a similar platform technology. A good example of this type of architecture is the Core Informatics Platform for Science (PFS).

The Platform for Science provides a secure public or private multi-tenant data management infrastructure comprised of Core LIMS, Core ELN and Core SDMS capabilities, hosted in Amazon Web Services and provides a collection of applications designed to facilitate external research collaboration. To easily extend the base functionality provided by the Core Platform, users can configure or download applications access from a marketplace of customer and vendor sponsored functionality. Be sure to view the PFS technology brief.


The Cloud provides many attractive options to facilitate scientific collaboration. Infrastructure providers such as Amazon Web Services (AWS) provide the elastic computing Infrastructure as a Service (IaaS) enabling organizations to deal with the volume and velocity of their Big Data generated from these collaborations. The real challenge however lies in the variety of Big Data. To derive the most value from the results generated from external partners, organizations must invest in a Platform technology that has the flexibility to enable users with the ability to rapidly tailor their data management solution to meet their unique needs.

For More information on Core Informatics go here.