Recently released: Azure Data Architecture Guide (ADAG) – 36 Articles for Data Professionals

The Azure Data Architecture Guide:

The guide is structured around a basic pivot: The distinction between relational data and non-relational data.

Relational data is generally stored in a traditional RDBMS or a data warehouse. It has a pre-defined schema (“schema on write”) with a set of constraints to maintain referential integrity. Most relational databases use Structured Query Language (SQL) for querying. Solutions that use relational databases include online transaction processing (OLTP) and online analytical processing (OLAP).

Non-relational data is any data that does not use the relational model found in traditional RDBMS systems. This may include key-value data, JSON data, graph data, time series data, and other data types. The term NoSQL refers to databases that are designed to hold various types of non-relational data. However, the term is not entirely accurate, because many non-relational data stores support SQL compatible queries. Non-relational data and NoSQL databases often come up in discussions of big data solutions. A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional database systems.

Within each of these two main categories, the Data Architecture Guide contains the following sections:

  • Concepts. Overview articles that introduce the main concepts you need to understand when working with this type of data.
  • Scenarios. A representative set of data scenarios, including a discussion of the relevant Azure services and the appropriate architecture for the scenario.
  • Technology choices. Detailed comparisons of various data technologies available on Azure, including open source options. Within each category, we describe the key selection criteria and a capability matrix, to help you choose the right technology for your scenario.

This guide is not intended to teach you data science or database theory — you can find entire books on those subjects. Instead, the goal is to help you select the right data architecture or data pipeline for your scenario, and then select the Azure services and technologies that best fit your requirements. If you already have an architecture in mind, you can skip directly to the technology choices.

Traditional RDBMS

Concepts

Scenarios

Big data and NoSQL

Concepts

Scenarios

Cross-cutting concerns

Technical Reference Implementation for Enterprise BI and Reporting…

Technical Reference Implementation for Enterprise BI and Reporting

Azure offers a rich data and analytics platform for customers and ISVs seeking to build scalable BI and reporting solutions. However, customers face pragmatic challenges in building the right infrastructure for enterprise-grade production systems. They have to evaluate the various products for security, scale, performance and geo-availability requirements. They have to understand service features and their interoperability, and they must plan to address any perceived gaps using custom software. This takes time, effort, and many times, the end-to-end system architecture they design is sub-optimal.

ref: https://github.com/Azure/azure-arch-enterprise-bi-and-reporting/blob/master/README.md

Our solution included in Microsoft Ignite 2017 Keynote

Azure Cosmos DB, Azure DW, Machine Leaning, Deep Learning, Neural Networks, TensorFlow, SQL Server, ASP.NET Core… are just a few of the components that make up one of the solutions we are currently developing.

Have been under a social media embargo, until today, but now that the Microsoft Ignite 2017 keynote has taken place, I am able to proudly say that the solution our team has been working on for some time was part of the Keynote addresses.

During the second keynote lead by Scott Guthrie, Danielle Dean a Data Scientist Lead discussed at a high level, one of the solutions we are developing at Jabil, which involves advanced image recognition of circuit board issues. The keynote focused in on the context of the solutions data science portion and introduced the new Azure Machine Learning Workbench to the packed audience.

Tomorrow morning there is a session – “Using big data, the cloud, and AI to enable intelligence at scale” (Tuesday, September 26, from 9:00 AM to 10:15 AM, in Hyatt Regency Windermere X)… during which we will be going into a bit more detail, and the guys at Microsoft will be expanding on the new AI and Big Data machine learning capabilities (session details via this link).

Visual Studio 2017 version 15.3 Release Notes

Release Date: August 18, 2017 – Visual Studio 2017 version 15.3.1

Issues Fixed in August 18, 2017 Release

These are the customer-reported issues addressed in this version:


Summary: What’s New in this Release

  • Accessibility Improvements make Visual Studio more accessible than ever.
  • Azure Function Tools are included in the Azure development workload. You can develop Azure Function applications locally and publish directly to Azure.
  • You can now build applications in Visual Studio 2017 that run on Azure Stack and government clouds, like Azure in China.
  • We improved .NET Core development support for .NET Core 2.0, and Windows Nano Server containers.
  • In Visual Studio IDE, we improved Sign In and Identity, the start page, Lightweight Solution Load, and setup CLI. We also improved refactoring, code generation and Quick Actions.
  • The Visual Studio Editor has better accessibility due to the new ‘Blue (Extra Contrast)’ theme and improved screen reader support.
  • We improved the Debugger and diagnostics experience. This includes Point and Click to Set Next Statement. We’ve also refreshed all nested values in variable window, and made Open Folder debugging improvements.
  • Xamarin has a new standalone editor for editing app entitlements.
  • The Open Folder and CMake Tooling experience is updated. You can now use CMake 3.8.
  • We made improvements to the IntelliSense engine, and to the project and the code wizards for C++ Language Services.
  • Visual C++ Toolset supports command-prompt initialization targeting.
  • We added the ability to use C# 7.1 Language features.
  • You can install TypeScript versions independent of Visual Studio updates.
  • We added support for Node 8 debugging.
  • NuGet has added support for new TFMs (netcoreapp2.0, netstandard2.0, Tizen), Semantic Versioning 2.0.0, and MSBuild integration of NuGet warnings and errors.
  • Visual Studio now offers .NET Framework 4.7 development tools to supported platforms with 4.7 runtime included.
  • We added clusters of related events to the search query results in the Application Insights Search tool.
  • We improved syntax support for SQL Server 2016 in Redgate SQL Search.
  • We enabled support for Microsoft Graph APIs in Connected Services.

Read more at https://www.visualstudio.com/en-gb/news/releasenotes/vs2017-relnotes#15.3.26730.08

 

Cosmos DB Change Feed Processor NuGet package now available & Working with the change feed support

Cosmos DB Change Feed Processor NuGet package now available

Many database systems have features allowing change data capture or mirroring, for use with live backups, reporting, data warehousing and real time analytics for transactional systems… Azure Cosmos DB has such a feature called the Change Feed API, which was first introduced in May 2017.

The Change Feed API provides a list of new and updated documents in a partition in the order in which the updates were made.

Microsoft has just recently introduced the new Change Feed Processor Library which abstracts the existing Change Feed API to facilitate the distribution of change feed event processing across multiple consumers.

The Change Feed Processor library provides a thread-safe, multiple-process, runtime environment with checkpoint and partition lease management for change feed operations.

The Change Feed Processor Library is available as a NuGet package for .NET development. The library makes actions like these easier to read changes from a change feed across multiple partitions and performing computational actions triggered by the change feed in parallel (aka Complex Event Processing).

Judy Shen from the Microsoft Cosmos DB team has published some sample code on GitHub, demonstrating it’s use.

Working with the change feed support in Azure Cosmos DB

Aravind Ramachandran, Mimi Gentz and Judy Shen also just published an article Working with the change feed support in Azure Cosmos DB on the Azure docs site a few days ago…

2017-7-24

Azure Cosmos DB is a fast and flexible globally replicated database service that is used for storing high-volume transactional and operational data with predictable single-digit millisecond latency for reads and writes. This makes it well-suited for IoT, gaming, retail, and operational logging applications. A common design pattern in these applications is to track changes made to Azure Cosmos DB data, and update materialized views, perform real-time analytics, archive data to cold storage, and trigger notifications on certain events based on these changes. The change feed support in Azure Cosmos DB enables you to build efficient and scalable solutions for each of these patterns.

With change feed support, Azure Cosmos DB provides a sorted list of documents within an Azure Cosmos DB collection in the order in which they were modified. This feed can be used to listen for modifications to data within the collection and perform actions such as:

  • Trigger a call to an API when a document is inserted or modified
  • Perform real-time (stream) processing on updates
  • Synchronize data with a cache, search engine, or data warehouse

Changes in Azure Cosmos DB are persisted and can be processed asynchronously, and distributed across one or more consumers for parallel processing. Let’s look at the APIs for change feed and how you can use them to build scalable real-time applications. This article shows how to work with Azure Cosmos DB change feed and the DocumentDB API.

Azure Cosmos DB Change Feed.png

Note
Change feed support is only provided for the DocumentDB API at this time; the Graph API and Table API are not currently supported.

Use cases and scenarios
Change feed allows for efficient processing of large datasets with a high volume of writes, and offers an alternative to querying entire datasets to identify what has changed. For example, you can perform the following tasks efficiently:

  • Update a cache, search index, or a data warehouse with data stored in Azure Cosmos DB.
  • Implement application-level data tiering and archival, that is, store “hot data” in Azure Cosmos DB, and age out “cold data” to Azure Blob Storage or Azure Data Lake Store.
  • Implement batch analytics on data using Apache Hadoop.
  • Implement lambda pipelines on Azure with Azure Cosmos DB. Azure Cosmos DB provides a scalable database solution that can handle both ingestion and query, and implement lambda architectures with low TCO.
  • Perform zero down-time migrations to another Azure Cosmos DB account with a different partitioning scheme.

Lambda Pipelines with Azure Cosmos DB for ingestion and query

Lambda Pipelines with Azure Cosmos DB for ingestion and query

You can use Azure Cosmos DB to receive and store event data from devices, sensors, infrastructure, and applications, and process these events in real-time with Azure Stream Analytics, Apache Storm, or Apache Spark.

Within web and mobile apps, you can track events such as changes to your customer’s profile, preferences, or location to trigger certain actions like sending push notifications to their devices using Azure Functions or App Services. If you’re using Azure Cosmos DB to build a game, you can, for example, use change feed to implement real-time leaderboards based on scores from completed games.

Read more at https://docs.microsoft.com/en-gb/azure/cosmos-db/change-feed

MEAN.js with Cosmos DB on Azure

(a YouTube series by John Papa)

Cosmos DB is of significant interest to myself for projects I have been engaged in for the past couple of years which use MongoDB and MEAN in several ways. Scaling for us has always been a bit of a pain with MongoDB, and Cosmos DB on Azure looks to be relieving a lot of the headaches we have had.

MEAN stands for MongoDB, Express, Angular and Node.

I am not the author of these – this is a reference list to a YouTube series by John Papa introducing MEAN with Cosmos DB on Azure. I would normally just link directly to the creators blog or post for a series such as this, but it seems to be offline just now so I thought I would share a full list of current videos here – hopefully the original link will work again soon – which is https://johnpapa.net/angular-cosmosdb-1/.


MEAN.js with Cosmos DB – Part 1: Introduction

John builds a lot of apps with MongoDB, Express, Angular and Node (MEAN). MongoDB just works so well with these, but recently he has been using Cosmos DB on Azure in its place because it’s easy to use, scale, is super fast, and he does not have to change how he codes.


MEAN.js with Cosmos DB – Part 2: Creating the Node.js and Express App

Creating a Node.js and Express App along with the Angular CLI. Then create a web API endpoint and try it out.


MEAN.js with Cosmos DB – Part 3: Angular and Express APIs

The A in MEAN stands for Angular. This video shows how to build an Angular UI that talks to the Express API, with GET, POST, PUT, and DELETE.


MEAN.js with Cosmos DB – Part 4: Creating and Deploying Cosmos DB

Using the Azure CLI, to create the Cosmos DB account to represent a MongoDB model database and deploy it to Azure. Then view what we created in the Azure portal.


MEAN.js with Cosmos DB – Part 5: Querying Cosmos DB

How to connect to the MongoDB database with Azure Cosmos DB, using Mongoose, and query it for data.

You can subscribe to John’s YouTube series at https://www.youtube.com/playlist?list=PLbnXt_I6OfBWU9JiDNewZm11-7eFQf70M or follow him on twitter @John_Papa