ASTRON & IBM Center for Exascale Technology

Access Patterns

All emerging research infrastructures and practically all enterprises are confronted with high storage demand growth, worldwide. The crucial question is how to handle storage requirement growth reaching tera- to petabytes per day. Addressing cost effectiveness, energy, easy access etc. This easily leads to billions of files, which also require meta-data information for retrieval. Through the IBM Research work on a novel, analytics based multi-tier storage approach, we expect that Tape storage will become a competitive, easy to handle component of storage systems and will achieve wide acceptance. Tape libraries are usually perceived as cumbersome but will become fully automated – for e.g. LOFAR and certainly SKA this can mean a major breakthrough in thinking about how to deal with data storage. Inclusion of new storage technologies as Phase Change Memory (which is about as fast as current computer memory, but denser and may-be 3x cheaper) might change system design completely. Such break-through technologies do however require a new data life-cycle paradigm and will lead to e.g. changes in algorithm implementation.

With the LOFAR telescope, ASTRON is generating data-streams already now that can just be handled by state-of-the-art technology. The next generation of research infrastructures, in particular the SKA will require beyond the current state-of-the-art (from Terabytes to Petabytes per day). The data needs to be stored at the lowest possible cost, should be archived as energy efficient as possible, and needs to be easily accessible by researchers from various locations worldwide.

Through a novel multi-tier storage approach, we expect that the need to introduce “smart storage analytics” with proactive capabilities for smart data placement and retrieval will become imperative in order to optimize the use of the multiple storage tiers, far beyond the current reactive tiering approaches. The suitability of different storage media heavily depends on the usage patterns when writing and reading data. Based on the findings of usage-investigations, simulations and modeling an optimized storage architecture including disk, tape and other storage media will be investigated and designed that takes into account the data access requirements as well as cost and other aspects. Smart analysis, modeling, and prediction of these usage patterns will be crucial for optimal storage utilization (cost, performance etc.). We also envision an extension to proactive learning / adjustment of data storage and access strategies depending on application or user requirements even when these change over time.