ClearScience and Cerbo IO: Managed High-Performance Distributed Database Cluster to Build a Decision Support System (DSS) for Advance Climate Analysis and Forecast (ACAF) System


Clear Science, Inc., founded in 2000, specializes in research and development, and discovery of Meteorology and Oceanography (METOC) data, Geospatial Information Support (GIS), and Data Handling and VIsualization. Clear Science, Inc was recently selected as one of four awardees of an SBIR (Small Business Innovative Research) grant. The grant is sponsored by the Office of Naval Research under the area entitled 'Information Systems, Battlespace.'

Navy mission planners need climate and weather data on daily, weekly, monthly, seasonal and multi-decadal timeframes. The availability of such data is critical for operational planning, logistics, energy use, etc.The ability to build a DSS that supports on-the-fly dataset extraction, probabilistic calculations, and present formatted output to answer specific and detailed questions is critical to operational planning.



The Challenge of Big Data in METOC:

<METOC Data Dictionary>



The Challenges of SciDB on Platform and Infrastructure:


SciDB is a massively scalable array database management system which provides dense data storage and high-performance linear algebra operations on an advanced analytics platform. SciDB uses a shared-nothing logical architecture to achieve scalable massively parallel processing, and this requires its own custom platform and infrastructure architecture.

  • Partitioning:

SciDB uses a chunking technique to partition multidimensional arrays where each instance is responsible for storing and updating the local subset of the array, with query execution on locally stored data. Physical infrastructure platform then suited for such a logical overlay, is that of hyperscale that couples compute with direct-attached tiered storage, from SRAM, DRAM, Flash SSDs and SATA HDs, to shared SAN raid() for cluster management.


  • Chunk Size:

SciDB administrator specifies chunk size for data split into regular, rectilinear chunks distributed uniformly between all instances, with a chunk overlap options. Infrastructure platform instances for storage and compute, must then be homogenous to ensure predictable and reliable performance.

  • Array Storage:

SciDB arrays consist of array chunk storage and array metadata stored in SciDB system catalog. When arrays are created, updated, or removed, they are done using transactions. Since these transactions span array storage and system catalog and SciDB guarantees consistency of overall database as queries are executed, underlying platform infrastructure and operations are highly available on distributed-everything architecture across the entire cluster, eliminating resource contention and non-disruptive operations.


Results:


Service framework to automate, provision, and manage tiered storage, compute, and network resources for SciDB Local and Massively Parallel Processing (MPP) architecture which enables parallel database operations and parallel maths in-database. Where SciDB MPP allows queries over big datasets to scale across larger clusters logically, Cerbo IO’s managed service framework means Clearscience can programmatically increase the size of its cluster to reduce query response times.

ClearScience's early engagement and ultimate selection of Cerbo IO's architecture proposal has helped them focus singularly on their Decision Support System (DSS) application for Advanced Climate Analysis and Forecast (ACAF) System, with confidence that Cerbo IO will maintain and manage their High-Performance Distributed Database Cluster environment as ClearScience's data grows from 1 TB to 15 TB to 150 TB.