Friday, November 10, 2017

BigData White Papers

I don't know about you, but I always like to read the white papers that originate OpenSource projects (when available of course :) ).

I have been working with BigData quite a lot lately and this area is mostly dominated by Apache OpenSource projects.

 So, naturally (given the nerd that I am) I tried to investigate their history. I created a list of articles and companies that originated most BigData Apache projects.

Here it is! Hope you guys find it interesting too. :)

Apache Hadoop 

Based on: Google MapReduce and GFS 

Apache Spark 

Created by: University of California, Berkeley 

Apache Hive 

Created by: Facebook

Apache Impala 

Based on: Google F1

Apache HBase

Based on: Google BigTable

Apache Drill 

Based on: Google Dremel

Apache Pig 

Created by: Yahoo!

Apache Oozie 

Created by: Yahoo!

Apache Sqoop 

Started as a module for Apache Hadoop on issue by Aaron Kimball.

Apache Flume


Friday, August 11, 2017

Deep Learning, TensorFlow and Tensor Core

I was lucky enough to get a ticket to the Google I/O 2017 on a Google Code Jam for Women (for girls that don't know, Google has some programming contest for women and the best classified win tickets to the conference).

One of the main topics of the conference was for sure its new Deep Learning library TensorFlow. TensorFlow is Google's OpenSource Machine Learning library that runs both on CPU and GPU.

Two very cool things were presented at Google I/O:

  •  TPU (Tensor Processing Unit) - a GPU optimized specifically for TensorFlow that can be used on the Google Cloud Engine
  •  TensorFlow Lite - a TensorFlow low weight version to run on Android and make developer's lives easier

Last week, at a BigData meetup in Chicago, I discovered that Nvidia also created a specific GPU hardware for processing Deep Learning, the Tensor Core.

 With all this infrastructure and APIs being made available, Deep Learning can be done considerably easier and faster. At Google I/O, Sundar Pichai mentioned that at Google they have been using Machine Learning for almost everything, and even Deep Learning to train the Deep Learning networks!

TensorFlow's API is so high level, that even someone with little technical background can develop something interesting with it. Sundar also shared a story of a high school guy that used the library to help detecting some types of cancer.

It seems that Data Science is becoming attainable.

Wednesday, August 2, 2017

Dummy errors when using neuralnet package in R

Ok, so you read a bunch of stuff on how to do Neural Networks and how many layers or nodes you should add, and etc... But when you start to implement the actual Neural Networks you face a ton of dummy errors that stop your beautiful inspirational programming.

This post talks about some errors you might face when using the neuralnet package in R.

First, remember, to use the package you should install it:




to load the package.

Error 1

One error that might happen training your neural network is this:

nn <- neuralnet(formula1,data=new_data, hidden=c(5,3))

Error in terms.formula(formula) : invalid model formula in ExtractVars

This happens when the name of the variables in formula "formula1" are in a non desired format. For example if you named your columns (or variables) as numbers you would get this error. So change your column names and re-run the model!


label ~ 1 + 2 + 3 + 4 + 5

Change to:

label ~ v1 + v2 + v3 + v4 + v5

Error 2

Another error you might get is the following:

nn <- neuralnet(f, data=train[,-1], hidden=c(3,3))

Warning message:  algorithm did not converge in 1 of 1 repetition(s) within the stepmax

To solve this, you can increase the size of "stepmax" parameter:

nn <- neuralnet(f, data=train[,-1], hidden=c(3,3), stepmax=1e6)

If that doesn't work, you might have to change other parameters to make it converge.  Try reducing the number of hidden nodes or layers. Or changing your training data size.

Error 3

The third error I want to discuss happens when actually computing the output of the neural network:

net.compute <- compute(net, matrix.train2[,1:10])
Error in neurons[[i]] %*% weights[[i]] : non-conformable arguments
This error occurs when the number of columns in the dataframe you are using to predict is different from the columns used to train the neural network. The data frames used in neuralnet and compute should have the same columns and the same names!

That is it! If you faced any other dummy error with the neuralnet package send me and I can add it to the post! Good luck! :D