Cosmos DB Change Feed Processor NuGet package now available & Working with the change feed support

Cosmos DB Change Feed Processor NuGet package now available

Many database systems have features allowing change data capture or mirroring, for use with live backups, reporting, data warehousing and real time analytics for transactional systems… Azure Cosmos DB has such a feature called the Change Feed API, which was first introduced in May 2017.

The Change Feed API provides a list of new and updated documents in a partition in the order in which the updates were made.

Microsoft has just recently introduced the new Change Feed Processor Library which abstracts the existing Change Feed API to facilitate the distribution of change feed event processing across multiple consumers.

The Change Feed Processor library provides a thread-safe, multiple-process, runtime environment with checkpoint and partition lease management for change feed operations.

The Change Feed Processor Library is available as a NuGet package for .NET development. The library makes actions like these easier to read changes from a change feed across multiple partitions and performing computational actions triggered by the change feed in parallel (aka Complex Event Processing).

Judy Shen from the Microsoft Cosmos DB team has published some sample code on GitHub, demonstrating it’s use.

Working with the change feed support in Azure Cosmos DB

Aravind Ramachandran, Mimi Gentz and Judy Shen also just published an article Working with the change feed support in Azure Cosmos DB on the Azure docs site a few days ago…

2017-7-24

Azure Cosmos DB is a fast and flexible globally replicated database service that is used for storing high-volume transactional and operational data with predictable single-digit millisecond latency for reads and writes. This makes it well-suited for IoT, gaming, retail, and operational logging applications. A common design pattern in these applications is to track changes made to Azure Cosmos DB data, and update materialized views, perform real-time analytics, archive data to cold storage, and trigger notifications on certain events based on these changes. The change feed support in Azure Cosmos DB enables you to build efficient and scalable solutions for each of these patterns.

With change feed support, Azure Cosmos DB provides a sorted list of documents within an Azure Cosmos DB collection in the order in which they were modified. This feed can be used to listen for modifications to data within the collection and perform actions such as:

  • Trigger a call to an API when a document is inserted or modified
  • Perform real-time (stream) processing on updates
  • Synchronize data with a cache, search engine, or data warehouse

Changes in Azure Cosmos DB are persisted and can be processed asynchronously, and distributed across one or more consumers for parallel processing. Let’s look at the APIs for change feed and how you can use them to build scalable real-time applications. This article shows how to work with Azure Cosmos DB change feed and the DocumentDB API.

Azure Cosmos DB Change Feed.png

Note
Change feed support is only provided for the DocumentDB API at this time; the Graph API and Table API are not currently supported.

Use cases and scenarios
Change feed allows for efficient processing of large datasets with a high volume of writes, and offers an alternative to querying entire datasets to identify what has changed. For example, you can perform the following tasks efficiently:

  • Update a cache, search index, or a data warehouse with data stored in Azure Cosmos DB.
  • Implement application-level data tiering and archival, that is, store “hot data” in Azure Cosmos DB, and age out “cold data” to Azure Blob Storage or Azure Data Lake Store.
  • Implement batch analytics on data using Apache Hadoop.
  • Implement lambda pipelines on Azure with Azure Cosmos DB. Azure Cosmos DB provides a scalable database solution that can handle both ingestion and query, and implement lambda architectures with low TCO.
  • Perform zero down-time migrations to another Azure Cosmos DB account with a different partitioning scheme.

Lambda Pipelines with Azure Cosmos DB for ingestion and query

Lambda Pipelines with Azure Cosmos DB for ingestion and query

You can use Azure Cosmos DB to receive and store event data from devices, sensors, infrastructure, and applications, and process these events in real-time with Azure Stream Analytics, Apache Storm, or Apache Spark.

Within web and mobile apps, you can track events such as changes to your customer’s profile, preferences, or location to trigger certain actions like sending push notifications to their devices using Azure Functions or App Services. If you’re using Azure Cosmos DB to build a game, you can, for example, use change feed to implement real-time leaderboards based on scores from completed games.

Read more at https://docs.microsoft.com/en-gb/azure/cosmos-db/change-feed

Development Workflows for Data Scientists • Free eBook

Development Workflows for Data Scientists - Free eBook.JPGEnabling Fast, Efficient, and Reproducible Results for Data Science • via GitHub

GitHub partnered with O’Reilly Media to examine how data science and analytics teams at several data-driven organizations are improving the way they define, enforce, and automate development workflows.

Download this complimentary book from: – https://resources.github.com/whitepapers/data-science/ 

Power BI custom visual from Visio

Visualize business process workflows, real-world layouts like factory floor plans, network diagrams, organization structures or any illustration created in Microsoft Visio and easily connect it to Power BI data. Contextually represent Power BI data as colours or text on Visio diagrams. Now drive Operational Intelligence effectively using Visio custom visual.

Second version of HoloLens HPU will incorporate AI coprocessor for implementing DNNs

HPU_2.0_1260x539-1024x438.png

Posted July 23, 2017 | by Microsoft Research Blog

By Marc Pollefeys, Director of Science, HoloLens

It is not an exaggeration to say that deep learning has taken the world of computer vision, and many other recognition tasks, by storm. Many of the most difficult recognition problems have seen gains over the past few years that are astonishing.

Although we have seen large improvements in the accuracy of recognition as a result of Deep Neural Networks (DNNs), deep learning approaches have two well-known challenges: they require large amounts of labelled data for training, and they require a type of compute that is not amenable to current general purpose processor/memory architectures. Some companies have responded with architectures designed to address the particular type of massively parallel compute required for DNNs, including our own use of FPGAs, for example, but to date these approaches have primarily enhanced existing cloud computing fabrics.

But I work on HoloLens, and in HoloLens, we’re in the business of making untethered mixed reality devices. We put the battery on your head, in addition to the compute, the sensors, and the display. Any compute we want to run locally for low-latency, which you need for things like hand-tracking, has to run off the same battery that powers everything else. So what do you do?

You create custom silicon to do it.

First, a bit of background. HoloLens contains a custom multiprocessor called the Holographic Processing Unit, or HPU. It is responsible for processing the information coming from all of the on-board sensors, including Microsoft’s custom time-of-flight depth sensor, head-tracking cameras, the inertial measurement unit (IMU), and the infrared camera. The HPU is part of what makes HoloLens the world’s first–and still only–fully self-contained holographic computer.

Today, Harry Shum, executive vice president of our Artificial Intelligence and Research Group, announced in a keynote speech at CVPR 2017, that the second version of the HPU, currently under development, will incorporate an AI coprocessor to natively and flexibly implement DNNs. The chip supports a wide variety of layer types, fully programmable by us. Harry showed an early spin of the second version of the HPU running live code implementing hand segmentation.

The AI coprocessor is designed to work in the next version of HoloLens, running continuously, off the HoloLens battery. This is just one example of the new capabilities we are developing for HoloLens, and is the kind of thing you can do when you have the willingness and capacity to invest for the long term, as Microsoft has done throughout its history. And this is the kind of thinking you need if you’re going to develop mixed reality devices that are themselves intelligent. Mixed reality and artificial intelligence represent the future of computing, and we’re excited to be advancing this frontier.

Source: https://www.microsoft.com/en-us/research/blog/second-version-hololens-hpu-will-incorporate-ai-coprocessor-implementing-dnns/

AI for security: Microsoft Security Risk Detection makes debut

Full details at: – https://blogs.microsoft.com/next/2017/07/21/ai-for-security-microsoft-security-risk-detection-makes-debut/

Microsoft is making a cloud service that uses artificial intelligence to track down bugs in software generally available, and it will begin offering a preview version of the tool for Linux users as well.

Microsoft Security Risk Detection, previously known as Project Springfield, is a cloud-based tool that developers can use to look for bugs and other security vulnerabilities in the software they are preparing to release or use. The tool is designed to catch the vulnerabilities before the software goes out the door, saving companies the heartache of having to patch a bug, deal with crashes or respond to an attack after it has been released.

Free course on Deep Learning for Self-Driving Cars

self drive cars.pngA free course and introduction to deep learning through the applied task of building a self-driving car. Taught by Lex Fridman.

Visit http://selfdrivingcars.mit.edu/ for full details of “MIT 6.S094: Deep Learning for Self-Driving Cars“.

Data Science: Performance of Python vs Pandas vs Numpy

Re-post from http://machinelearningexp.com/data-science-performance-of-python-vs-pandas-vs-numpy/

Speed and time is a key factor for any Data Scientist. In business, you do not usually work with toy datasets having thousands of samples. It is more likely that your datasets will contain millions or hundreds of millions samples. Customer orders, web logs, billing events, stock prices – datasets now are huge.

I assume you do not want to spend hours or days, waiting for your data processing to complete. The biggest dataset I worked with so far contained over 30 million of records. When I run my data processing script the first time for this dataset, estimated time to complete was around 4 days! I do not have very powerful machine (Macbook Air with i5 and 4 GB of RAM), but the most I could accept was running the script over one night, not multiple days.

Thanks to some clever tricks, I was able to decrease this running time to a few hours. This post will explain the first step to achieve good data processing performance – choosing right library/framework for your dataset.

The graph below shows result of my experiment (details below), calculated as processing speed measured against processing speed of pure Python.

Python vs Numpy vs Pandas

As you can see, Numpy performance is several times bigger than Pandas performance. I personally love Pandas for simplifying many tedious data science tasks, and I use it wherever I can. But if the expected processing time spans for more than many hours, then, with regret, I change Pandas to Numpy.

I am very aware that the actual performance may vary significantly, depending on a task and type of processing. So please, treat these result as indicative only. There is no single test that can shown “overall” comparison of performance for any set of software tools.

Posted on July 15, 2017 by

see full post @ http://machinelearningexp.com/data-science-performance-of-python-vs-pandas-vs-numpy/