New Big Data Technology

Contact

R20/Consultancy

+31 252-514080

Title: New Big Database Technologies; A Market Overview of Technologies and Products

Introduction

With the introduction of big data and cloud platforms, a tsunami of new technologies and products for data storage, processing, and analytics has been introduced. Hadoop, Spark, NoSQL, NewSQL, triplestores, SQL-on-Hadoop are just a few of the countless technologies that have become available for developing big data systems. But also so many new powerful database engines have entered the market, including Amazon Athena, Cloudera, Exasol, Google BigQuery, Microsoft Synapse, MongoDB, Neo4j, SingleStore, SnowflakeDB, Splice Machine, and Starburst.

Most organizations have many questions. How mature are all these new technologies? Are they worthy replacements for the more traditional SQL products? How should they be incorporated in existing data warehouse architecture? Should they be used to develop data lakes? Are they the perfect platforms for data science, or for operational BI?

This seminar gives a clear, extensive, and critical overview of all the new key technologies for storing, processing, and analyzing big data. Technologies are explained, market overviews are presented, strengths and weaknesses are discussed, and guidelines and best practices are given. It’s the perfect update for those interested in the new market of big data technology.

Subjects

1. Big Data: State of the art

What exactly do we mean with big data?
The key application area of big data: business analytics
Differences between semi-structured, poly-structured, multi-structured, and unstructured data
Big data systems require specialization of database engines

2. Analytical SQL Database Servers

Classification of analytical SQL database servers, and can they compete with NoSQL products?
Techniques to improve performance and scalabiloty, including column-based storage, sharding, in-memory analytics, and query compilation
How important is in-database analytics?
Is loading databases into internal memory the solution? Is it feasible?
Market overview, including Amazon Athena, Exasol, Google BigQuery, HP/Vertica, Microsoft Synapse, SingleStore, SnowflakeDB, Splice Machine, and Starburst.

3. The World of Hadoop and Spark

The Hadoop stack explained: HDFS, MapReduce, Spark, Hive, HBase, YARN, ZooKeeper, Pig, HCatalog, and so on
Characteristics and consequences of HDFS and file formats
Alternative implementations by Amazon, Google, and Microsoft
Kafka for fast messaging

4. NoSQL Database Stores

Classification of NoSQL products: key-values stores, document stores, column-family stores, and graph data stores
It’s all about data scalability and performance
Why is schema-on-read more flexible than schema-on-write?
Are NoSQL products really database servers?
Market overview, including Apache HBase and CouchDB, Cassandra, Cloudera, DataStax, InfiniteGraph, MongoDB, and Neo4J

5. Exploring Data in Hadoop Using SQL

Making Hadoop data available for reporting and analysis through SQL-on-Hadoop engines
Examples of SQL-on-Hadoop engines, including Apache Drill, Apache Hive, Apache Phoenix, Cloudera Impala, HP Vertica, Pivotal HawQ, Spark SQL and Splice Machine
Data virtualization for unleashing the information hidden in NoSQL and SQL systems

6. NewSQL database servers for transaction workloads

NewSQL database servers are designed for high-performance transactional systems
Simpler transaction mechanisms
The challenge of multi-table joins
Market overview, including CitusDB, Clustrix, and SingleStore

7. Concluding Remarks

What You Will Learn:

Why traditional database technology is not “big” enough
How analytical SQL engines can help to simplify data architectures
How different are Hadoop and NoSQL from traditional technology
How new and existing technologies such as Hadoop, NoSQL, and NewSQL can help develop BI and big data systems
How to embed Hadoop technologies in existing BI systems
How Spark can boost performance for analytics
How to distinguish between three NoSQL subcategories: key-value, document, and column-family stores
Why graph databases are very different from all other systems
When to use NewSQL or NoSQL for developing transactional systems
How to simplify data access through SQL-on-Hadoop engines
When to use which new data storage technology and the pros and cons of each solution
Which products and technologies are winners and which are losers

Geared to: IT architects; database specialists; big data specialists; BI specialists; data warehouse designers; technology planners; technical architects; enterprise architects; IT consultants; IT strategists; systems analysts; database developers; database administrators; solutions architects; data architects.

Related Articles and Blogs:

Interview with Rick van der Lans: New Technologies Complementing Traditional BI

Related Whitepapers:

SQL Syntax for Apache Drill; Using SQL for the SQL-on-Everything Engine; December 2015; sponsored by DZone

InfiniteGraph: Extending Business, Social, and Government Intelligence with Graph Analytics; September 2010; sponsored by InfiniteGraph