Data Science: Performance of Python vs Pandas vs Numpy

Re-post from http://machinelearningexp.com/data-science-performance-of-python-vs-pandas-vs-numpy/

Speed and time is a key factor for any Data Scientist. In business, you do not usually work with toy datasets having thousands of samples. It is more likely that your datasets will contain millions or hundreds of millions samples. Customer orders, web logs, billing events, stock prices – datasets now are huge.

I assume you do not want to spend hours or days, waiting for your data processing to complete. The biggest dataset I worked with so far contained over 30 million of records. When I run my data processing script the first time for this dataset, estimated time to complete was around 4 days! I do not have very powerful machine (Macbook Air with i5 and 4 GB of RAM), but the most I could accept was running the script over one night, not multiple days.

Thanks to some clever tricks, I was able to decrease this running time to a few hours. This post will explain the first step to achieve good data processing performance – choosing right library/framework for your dataset.

The graph below shows result of my experiment (details below), calculated as processing speed measured against processing speed of pure Python.

Python vs Numpy vs Pandas

As you can see, Numpy performance is several times bigger than Pandas performance. I personally love Pandas for simplifying many tedious data science tasks, and I use it wherever I can. But if the expected processing time spans for more than many hours, then, with regret, I change Pandas to Numpy.

I am very aware that the actual performance may vary significantly, depending on a task and type of processing. So please, treat these result as indicative only. There is no single test that can shown “overall” comparison of performance for any set of software tools.

Posted on July 15, 2017 by

see full post @ http://machinelearningexp.com/data-science-performance-of-python-vs-pandas-vs-numpy/

 

 

Amazon brings .Net Core support to AWS Cloud

aws.jpg

Re-post from http://opensourceforu.com/2017/07/amazon-brings-net-core-support-aws-cloud/

Encouraging developers to massively build cross-platform applications, Amazon has added .Net Core support to its AWS Cloud services. The series that has been upgraded with the new support includes the AWS CodeStar and AWS CloudBuild services.

“The support for .Net Core in AWS CodeStar and AWS CodeBuild opens the door for .Net developers to take advantage of the benefits of Continuous Integration and Delivery when building .Net based solutions on AWS,” said Tara Walker, technical evangelist, Amazon Web Services (AWS), in a statement.

The AWS team launched the CodeStar service back in April for Amazon EC2, AWS Elastic Beanstalk and AWS Lambda projects using five programming languages, including JavaScript, Java, Python, Ruby and PHP. Though the original list of supported languages was covering a large part, Amazon has now planned to target developers on Microsoft’s Azure by enabling .Net Core support.

Deploy code on Amazon EC2 and AWS Lambda

Developers can leverage the latest support to build and deploy their .Net Core application code to both Amazon EC2 and AWS Lambda. This ability comes through the CodeBuild service that brings two new project templates to AWS CodeStar for .Net Core applications. Also, there is sample code and a full software development toolchain to ease the development.

Importantly, the presence of Visual Studio 2017 is required alongside the AWS Toolkit for Visual Studio 2017 to start building .Net Core applications for Amazon’s cloud solution. You can also deploy your existing .Net Core code enable your applications on AWS.

by  on July 13, 2017

 

Angular 4.3 Now Available

AngularJS
Re-post from http://angularjs.blogspot.co.uk/2017/07/angular-43-now-available.html

Angular version 4.3 has been released. This is a minor release following our announced adoption of Semantic Versioning, meaning that it contains no breaking changes and that it is a drop-in replacement for 4.x.x.

 
What’s new?
  • We are introducing HttpClient, a smaller, easier to use, and more powerful library for making HTTP Requests. Learn more about it from our docs
  • New router life cycle events for Guards and Resolvers. Four new events: GuardsCheckStart, GuardsCheckEnd, ResolveStart, ResolveEnd join the existing set of life cycle event such as NavigationStart
  • Conditionally disable animations via a new attribute, [@.disabled]
  • Support for the emulated /deep/ CSS Selector (the Shadow-Piercing descendant combinator aka >>>) has been deprecated to match browser implementations and Chrome’s intent to remove. ::ng-deep has been added to provide a temporary workaround for developers currently using this feature.
For the complete list of features and bugfixes please see the changelog.
   –

Azure Cosmos DB with Scott Hanselman

Published on Jun 27, 2017

Kirill Gavrylyuk stops by Azure Friday to talk Cosmos DB with Scott Hanselman.

Watch this quick overview of the industry’s first globally distributed multi-model database service followed by a demo of moving an existing MongoDB app to Cosmos DB with a single config change.

For more information, see: https://azure.microsoft.com/en-us/services/cosmos-db/

Querying Azure Cosmos DB resources using the REST API

Cosmos DB REST API Query
Azure Cosmos DB is a globally distributed multi-model database with support for multiple APIs. This is a link to an article which describes how to use REST to query resources using the Azure Cosmos DB API – https://docs.microsoft.com/en-us/rest/api/documentdb/querying-documentdb-resources-using-the-rest-api

All Cosmos DB resources (with the exception of account resources) can be queried using Azure Cosmos DB SQL language. See Query with Azure Cosmos DB SQL for additional details on syntax – http://azure.microsoft.com/documentation/articles/documentdb-sql-query

For a full sample using .NET visit https://github.com/Azure/azure-documentdb-dotnet/tree/master/samples/rest-from-.net

Configuring Power BI Gateway Data Sources For Files And Folders

Configuring Power BI Gateway Data Sources For Files And Folders
by Chris Webb

… “building a lot of Power BI reports from csv and Excel files, and to make sure that scheduled refresh works I have been setting up data sources in an On Premises Data Gateway (what used to be called the Enterprise Gateway). I had assumed that if I was connecting to file-based data sources in my Power BI dataset then, in the gateway, I would need to set up one data source for each file that I’m connecting to – which is a bit of a pain. In fact it turns out that you can set up a gateway data source for the folder that the files are in instead” … https://blog.crossjoin.co.uk/2017/07/14/configuring-power-bi-gateway-data-sources-for-files-and-folders/

Chris Webb's BI Blog

Recently I’ve been building a lot of Power BI reports from csv and Excel files, and to make sure that scheduled refresh works I have been setting up data sources in an On Premises Data Gateway (what used to be called the Enterprise Gateway). I had assumed that if I was connecting to file-based data sources in my Power BI dataset then, in the gateway, I would need to set up one data source for each file that I’m connecting to – which is a bit of a pain. In fact it turns out that you can set up a gateway data source for the folder that the files are in instead.

Let me give you an example. Imagine that you have three Excel files in a folder called C:Sales Data:

image

Now imagine that you have three queries in Power BI that get data from these three files:

image

Here’s an…

View original post 180 more words