Title: Incorporating Big Data, Hadoop, and NoSQL in Business Intelligence Systems and Data Warehouses
Big data, Hadoop, in-memory analytics, Spark, Kafka, self-service BI, fast data, data warehouse automation, analytical database servers, data virtualization, data vault, operational intelligence, predictive analytics, and NoSQL are just a few of the new technologies and techniques that have become available for developing BI systems. Most of them are very powerful and allow for development of more flexible and scalable
BI systems.
But which ones do you pick?
Due to this waterfall of new developments, it’s becoming harder and harder for organizations to select the right tools. Which technologies are relevant? Are they mature? What are their use cases? These are all valid but difficult to answer questions.
This seminar gives a clear, extensive, and critical overview of all the new developments and their inter-relationships. Technologies and techniques are explained, market overviews are presented, strengths and weaknesses are discussed, and guidelines and best practices are given.
The biggest revolution in BI is evidently big data. Therefore, considerable time in the seminar is reserved for this intriguing topic. Hadoop, Spark, MapReduce, Kafka, Hive, NoSQL, SQL-on-Hadoop are all explained. In addition, the relation with analytics is discussed extensively.
This seminar gives you a unique opportunity to see and learn about all the new BI developments. It’s the perfect update for those interested in knowing how to make BI systems ready for the coming ten years.
1. The Changing World of Business Intelligence
Big Data: Hype or reality?
Operational intelligence: does it require online data warehouses?
Fast data is the next frontier of big data
Data warehouses in the cloud
Self-service BI
The business value of analytics
2. Hadoop Explained
The relationship between big data and analytics
The Hadoop software stack explained, including HDFS, MapReduce, YARN, Kudu, Hive, Impala, Storm, Sqoop, Flume, and HBase
The balancing act: productivity versus scalability
Making big data available to a larger audience with SQL-on-Hadoop engines, such as Apache Drill, Apache Hive, Apache Impala, Apache Phoenix, HP Vertica, IBM BigSQL, JethroData, MemSQL, SparkSQL, and Splice Machine
3. Spark Explained
Spark is about in-memory analytical processing
The interfaces: SQL, R, Scala, Python
Does Spark need Hadoop?
The relationship between Spark and data science
Examples of use cases of Spark
4. NoSQL Explained
Classification of NoSQL database servers: key-value stores, document stores, column-family stores and graph data stores
Market overview: CouchDB, Cassandra, Cloudera, MongoDB, and Neo4j
Strong consistency or eventual consistency?
Why an aggregate data model?
Use case of NoSQL products
How to analyze data stored in NoSQL databases
5. Overview of Analytical SQL Database Servers
Are classic SQL database servers more suitable for data warehousing?
Important performance improving features: column-oriented storage, in-database analytics
- The new generation of GPU-based database servers: BlazingDB, Kinetica, MapD, and SQream
Market overview of analytical SQL database servers: Apache Greenplum, Edge Intelligence, Exasol, HP Vertica, IBM PureData Systems for Analytics, InfoBright, Kognitio WX2, Microsoft PDW, Oracle In-Memory, SAP HANA and Sybase IQ, SnowflakeDB, Teradata Appliance, and Teradata Aster Database
6. Technologies for Fast Data and Streaming Analytics
The key use case for fast data: the Internet of Things (IoT)
IoT implies streaming data and fast analysis of data - analytics at the speed of business
IoT devices: Smartphones (watches), RFID sensors, machines, general sensors, cameras, pace makers, and so on
The challenge: real-time reactions on streaming data
The difference between big data and fast big data
Technologies for streaming data: Apache Kafka, Apache ActiveMQ, Amazon Kinesis, Kestrel, RabbitMQ, and ZeroMQ
Differences between these new technologies and traditional message queuing products
Products for big data streaming: Apache Storm and Flink, IBM InfoSphere Streams, Informatica for Streaming Analytics, Software AG Apama, and Spark Streaming
How to integrate fast data with the enterprise data warehouse?
7. Data Virtualization for Agile BI systems and Lean Integration
Data virtualization offers on-demand data integration
Seamlessly integrating big data and the data warehouse
Market overview: AtScale, Denodo Platform, RedHat JBoss Data Virtualization, Rocket DV, Stone Bond Enterprise Enabler, and Tibco Data Virtualization
Importing non-relational data, such as XML documents, web services, NoSQL and Hadoop data, and unstructured data
Differences between data virtualization and data blending
8. New Business Intelligence Architectures
Discussion of different BI architectures, including Kimball’s Data Warehouse Bus, Architecture, Inmon’s Corporate Information Factory, DW 2.0, the Federated Architecture, the Centralized Warehouse Architecture, the Data Virtualization Architecture, and the BI in the Cloud Architecture
Do we still need data marts?
What is the role of master data management in BI architectures?
Using data vault to create more flexible data warehouses
Data warehouse automation to create data warehouses and data marts faster
9. NewSQL Database Servers
NewSQL stands for high-performance transactional SQL database servers
Simpler transaction mechanisms to implement scale-out
What does the term geo-compliancy mean?
Market overview: Clustrix, GenieDB, NuoDB, and VoltDB
10. Data modeling for Big Data, Hadoop, and NoSQL
Explanation of non-relational concepts, such as column families, hierarchies, sets, and lists
Is storing unstructured and semi-structured data really more flexible?
The differences between schema-on-read and schema-on-write
Rules for transforming classic data models to NoSQL concepts
Application needs influence database design
11. Closing Remarks
Learning Objectives
In this seminar Rick van der Lans answers the following questions:
Learn about the trends and the technological developments related to business intelligence, analytics, data warehousing, streaming analytics, and big data.
Discover the value of big data and analytics for organizations
Learn which products and technologies are winners and which ones are losers.
Learn how new and existing technologies, such as Hadoop, NoSQL and NewSQL, will help you create new opportunities in your organization.
Learn how more agile data business intelligence systems can be designed.
Learn how to embed big data and analytics in existing business intelligence architectures.
Intended Audience:
Business Intelligence Specialists, Data Warehouse Designers, Business Analysts, Technology Planners, Technical Architects, Enterprise Architects, IT Consultants, IT Strategists, Systems Analysts, Database Developers, Database Administrators, Solutions Architects, Data Architects, IT Managers
Related Whitepapers:
SQL Syntax for Apache Drill; Using SQL for the
SQL-on-Everything Engine; December 2015; sponsored by DZone
How Drill Enriches Self-Service Analytics; The Added
Value of a SQL-on-Everything Engine; November 2015;
sponsored by MapR Technologies
SQL-on-Hadoop Engines Explained; May 2014; sponsored
by MapR Technologies
SAP HANA and Data Virtualization: Competitors or
Complements?; September 2012; sponsored by Cisco (Composite
Mixed, Shifting, and High-Concurrency Workloads in Data
Warehouse Systems; July 2012; sponsored by Teradata
Using SQL-MapReduce for Advanced Analytical Queries -
Second Edition; September 2011; sponsored by Teradata InfiniteGraph: Extending Business, Social, and
Government Intelligence with Graph Analytics; September 2010;
sponsored by InfiniteGraph