Tag Archives: Data Science
Jabil pilots Azure and Project Brainwave in advanced manufacturing solutions
Jabil provides advanced manufacturing solutions that require visual inspection of components on production lines. Their pilot with Azure Machine Learning and Project Brainwave promises dramatic improvements in speed and accuracy, reducing workload and improving focus for human operators.
Proud to be the lead architect working on advanced Machine Learning solutions and pipelines at Jabil.
Our machine learning development project makes it into //BUILD 2018 Keynote
Sometimes working on advanced technologies comes with the peril of NDAs … which limit what I can talk about… but it is nice to see yet another of our projects feature in Keynote speech by Satya Nadella, this time at Microsoft //BUILD 2018. Proud to be the lead architect working on advanced Machine Learning solutions and pipelines at Jabil.
Development Workflows for Data Scientists • Free eBook
Enabling Fast, Efficient, and Reproducible Results for Data Science • via GitHub
GitHub partnered with O’Reilly Media to examine how data science and analytics teams at several data-driven organizations are improving the way they define, enforce, and automate development workflows.
Download this complimentary book from: – https://resources.github.com/whitepapers/data-science/
Power BI custom visual from Visio
Visualize business process workflows, real-world layouts like factory floor plans, network diagrams, organization structures or any illustration created in Microsoft Visio and easily connect it to Power BI data. Contextually represent Power BI data as colours or text on Visio diagrams. Now drive Operational Intelligence effectively using Visio custom visual.
AI for security: Microsoft Security Risk Detection makes debut
Full details at: – https://blogs.microsoft.com/next/2017/07/21/ai-for-security-microsoft-security-risk-detection-makes-debut/
Microsoft is making a cloud service that uses artificial intelligence to track down bugs in software generally available, and it will begin offering a preview version of the tool for Linux users as well.
Microsoft Security Risk Detection, previously known as Project Springfield, is a cloud-based tool that developers can use to look for bugs and other security vulnerabilities in the software they are preparing to release or use. The tool is designed to catch the vulnerabilities before the software goes out the door, saving companies the heartache of having to patch a bug, deal with crashes or respond to an attack after it has been released.
Free course on Deep Learning for Self-Driving Cars
A free course and introduction to deep learning through the applied task of building a self-driving car. Taught by Lex Fridman.
Visit http://selfdrivingcars.mit.edu/ for full details of “MIT 6.S094: Deep Learning for Self-Driving Cars“.
Data Science: Performance of Python vs Pandas vs Numpy
Re-post from http://machinelearningexp.com/data-science-performance-of-python-vs-pandas-vs-numpy/
Speed and time is a key factor for any Data Scientist. In business, you do not usually work with toy datasets having thousands of samples. It is more likely that your datasets will contain millions or hundreds of millions samples. Customer orders, web logs, billing events, stock prices – datasets now are huge.
I assume you do not want to spend hours or days, waiting for your data processing to complete. The biggest dataset I worked with so far contained over 30 million of records. When I run my data processing script the first time for this dataset, estimated time to complete was around 4 days! I do not have very powerful machine (Macbook Air with i5 and 4 GB of RAM), but the most I could accept was running the script over one night, not multiple days.
Thanks to some clever tricks, I was able to decrease this running time to a few hours. This post will explain the first step to achieve good data processing performance – choosing right library/framework for your dataset.
The graph below shows result of my experiment (details below), calculated as processing speed measured against processing speed of pure Python.
As you can see, Numpy performance is several times bigger than Pandas performance. I personally love Pandas for simplifying many tedious data science tasks, and I use it wherever I can. But if the expected processing time spans for more than many hours, then, with regret, I change Pandas to Numpy.
I am very aware that the actual performance may vary significantly, depending on a task and type of processing. So please, treat these result as indicative only. There is no single test that can shown “overall” comparison of performance for any set of software tools.
Posted on July 15, 2017 by
see full post @ http://machinelearningexp.com/data-science-performance-of-python-vs-pandas-vs-numpy/