Contact

R20/Consultancy

+31 252-514080

info@r20.nl

 

Support independent publishing: buy this book on Lulu.

Amazon for the Kindle:

Barnes and Noble for the Nook:

Introduction

Data virtualization technology allows organizations to quickly unlock the data they have stored in their multitude of databases and systems. Data virtualization can integrate data from multiple sources and can deliver that data in all kinds of forms and shapes to many different types of data consumption, ranging from simple dashboards and reports via Java apps running on a mobile device to advanced forms of analytics initiated by data scientists.

Now that organizations want to become more data driven and are on their digital transformation journey, being able to exploit the data is the key to success. Unfortunately, because most of the data is deeply buried in complex applications and stored in intricate database structures, it is often not available when organizations need it. This is where data virtualization comes in. It makes it possible to unlock all the data more easily and more quickly than with most other technologies.

Table of Contents

Preface

1 Introduction to Data Virtualization

  • 1.1 What is Data Virtualization?
  • 1.2 Development With a Data Virtualization Server
  • 1.3 Clearly Defining Data Virtualization, Data Federation, and Data Integration
  • 1.4 Data Virtualization is NOT the Same as Data Federation
  • 1.5 Alternative Definition of Data Virtualization
  • 1.6 Data Virtualization is About Productivity and Agility
  • 1.7 Creating an Agile Data Integration Platform Using Data Virtualization
  • 1.8 The Network is the Database; Integrating Widely Dispersed Big Data with Data Virtualization
  • 1.9 Data Virtualization and the Fulfilling of Ted Codd’s Dream

2 Use Cases of Data Virtualization

  • 2.1 The SQL-fication of NoSQL Continues
  • 2.2 Convergence of Data Virtualization and SQL-on-Hadoop Engines
  • 2.3 Simplifying Big Data Integration with Data Virtualization
  • 2.4 Simplifying Big Data Projects with Data Virtualization
  • 2.5 Data Virtualization in the Time of Big Data
  • 2.6 Data Virtualization for Agile Business Intelligence Systems
  • 2.7 Data Virtualization for Developing Customer-Facing Apps
  • 2.8 Transparently Offloading Data Warehouse Data to Hadoop using Data Virtualization
  • 2.9 Easy Database Migration with Data Virtualization

3 The Data Delivery Platform

  • 3.1 The Flaws of the Classic Data Warehouse Architecture, Part 1
  • 3.2 The Flaws of the Classic Data Warehouse Architecture, Part 2 – The Introduction of the Data Delivery Platform
  • 3.3 The Flaws of the Classic Data Warehouse Architecture, Part 3 – The Data Delivery Platform versus the Rest of the World
  • 3.4 A Definition of the Data Delivery Platform
  • 3.5 The Requirements of the Data Delivery Platform
  • 3.6 The Data Delivery Platform: Collected Comments

4 The Logical Data Warehouse

  • 4.1 It’s Time for the Logical Data Warehouse
  • 4.2 The Logical Data Warehouse is NOT the same as Data Virtualization
  • 4.3 The Roots of the Logical Data Warehouse
  • 4.4 The Need for Flexible, Bimodal Logical Data Warehouses
  • 4.5 The Logical Data Warehouse is Tolerant to Changes
  • 4.6 The Big BI Dilemma

5 The Unified Data Delivery Platform

  • 5.1 Drowning in Data Delivery Systems
  • 5.2 Unifying Data Delivery Systems Through Data Virtualization
  • 5.3 Key Benefits of a Unified Data Delivery Platform
  • 5.4 How Siloed Data Delivery Systems Were Born
  • 5.5 Big Data is Not the Biggest Change in IT
  • 5.6 Requirements of the Unified Data Delivery Platform
  • 5.7 A Unified Data Delivery Platform—A Summary

6 The Author Rick F. van der Lans

Why this Book?

In 2012, I published a book  on data virtualization entitled Data Virtualization for Business Intelligence Systems. It describes what data virtualization is, what the pros and cons are, how the tools work internally, and what possible use cases exist. Since the book was published, much has changed. Our insights on data virtualization have changed, the market has changed, we have much more practical experience with the tools, the tools themselves have matured even further, and the technology is being used in larger projects. In short, there is much more known about data virtualization. Therefore, I felt it was time to write a book containing more up-to-date information, which can be seen as an addendum to my original book.

Instead of writing the book from scratch, I decided to use existing material. Therefore, the book is a bundling of articles, blogs, and whitepapers I wrote on the data virtualization topic over the last ten years. Some are included in their original form, some have been updated, and some are combined or shortened to fit this book.

Why the Title ‘Selected Writings?’

Most likely, many musicians have heroes, book authors have heroes, and sports people have heroes. I have a hero as well, a database hero. From the first day I started in IT, my hero is C.J. (“Chris”) Date. He is the author of numerous books on databases and he has had an enormous impact on the adoption of relational databases in the market. I have always admired his work. The first book I worked on was a translation of one of his early books  Database: A Primer. Right away, it impressed me how clear, educational, and well-structured his writing was.

I have read many of his books and enjoyed especially the books in the Selected Writings series, such as Relational Database: Selected Writings which was first published in 1986 . To honor his writing I decided to follow his example, hence the title Data Virtualization: Selected Writings.

Original Articles and Blogs?

All the articles have been adapted to some degree. Because they were written over a time period of ten years, the terminology was not always used consistently, so I changed that. In some articles passages were removed because they overlapped too much with others. Originally, all the articles had to stand on themselves, so some contained introductory text. I removed most of those pieces of texts to avoid repetition. Reading an introduction fifteen times is no fun. Some articles I enriched with extra notes to increase the value of the article. I have also added comments to the articles to explain why I incorporated them or to clarify some of the remarks made or concepts introduced in the article.

Why Now?

The first real article I wrote on data virtualization was published in 2009, exactly ten years ago. In all these intervening years I wrote countless articles, blogs, and whitepapers on this and related topics. As indicated, I even wrote a book on data virtualization. I thought that ten years was a good excuse to bundle a selection of these writings.

Moreover, it feels as if the market has adopted data virtualization. It has taken some time, but finally, companies are studying, testing, and deploying this technology. Hopefully, making many of these writings easily available, helps the adopters with some of the questions they struggle with.

The third reason I decided to publish this book now, is that some of the older articles and blogs are no longer available online or otherwise; now they are.

For Whom Is This Book Intended?

  • Business intelligence specialists who are responsible for developing and managing a data warehouse and business intelligence environment; for those who want to know how such systems can be simplified by applying data virtualization and how data virtualization can lead to a more agile business intelligence system.
  • Information management specialists who want to know what the effect of data virtualization is on their profession, and how it will impact activities such as information management, data governance, database design, data cleansing, and data profiling.
  • Master data management specialists who are responsible for setting up a master data management system and want to know how they can benefit from deploying data virtualization.
  • Data architects who are responsible for designing an overall architecture for data delivery to any part of the organization.
  • Designers, analysts, and consultants who need to deal, directly or indirectly, with data virtualization and want to know about its possibilities and impossibilities.
  • IT students who want to know what data virtualization is and what the differences are with other data-related technologies.