Up and running with PySpark on Windows


If you, too, are coming from an R (or Python/Pandas) environment like me, you would feel highly comfortable processing CSV files with R or Python/Pandas. Whether you are using time series sensory data, random CSV files, or something else, R and Pandas can take it! If you can step away from R and Python/Pandas mindset, Spark really goes to a great length to make me feel welcome as an R and Python Pandas user.

These last days I have been working extremely closely with AWS EMR. I am not talking about creating a couple of trivial notebooks with a 5×5 data frame containing fruit names. The data set I am working with is 10s of gigs stored away in the cloud. The data is far from clean. I need to create an ETL pipeline to retrieve historical information. Which I would use to train my machine learning models. The predictive analysis on the new incoming data with machine learning – how am I doing it is probably a post (or series of posts) for a later date, probably. Today, I want to get you up and running with PySpark in no time!

Why am I writing this post?

There already is a plethora of blogs after blogs, and forums after forums on Spark and PySpark on the internet about how to install PySpark on Windows. These are mainly focused on setting up just PySpark. But what if I want to use Anaconda or Jupyter Notebooks or do not wish to use Oracle JDK? This post picks up where most other content lack. In this post, I want to help you connect the dots and save a lot of time, agony, and frustration. Regardless, you are new to Windows, Spark/PySpark, or development in general.

This process is as easy as ABC!

Benefit

The main benefit of following the approach I suggest in my blog post is, that you do not have to install anything (for the most part) and you can switch Spark, Hadoop, Java versions in seconds!

Let’s get started!

Continue reading “Up and running with PySpark on Windows”

Windows: configure VS Code integrated bash shell for Anaconda


So you’re / you’ve-been using Python in Windows. You know your way around setting up PATH variable so that you type “python” in your command prompt and it works. Now, say that you want to use Anaconda Python in bash. Let’s go one step further and say, you want to use the bash from your Visual Studio Code integrated shell. The process isn’t too different. There doesn’t seem to exist a guide, which covers all these together – hence this post.

My goal is to show you one of the possible ways to configure your development environment quickly – to you get you going in no time.

At the end you should have the following:

  • Bash shell working with python and,
  • Visual studio shell integration (optional)

Continue reading “Windows: configure VS Code integrated bash shell for Anaconda”

Restarting ALSA Audio


Follow these steps:

sudo /etc/init.d/alsa-utils stop
sudo alsa force-reload
sudo /etc/init.d/alsa-utils start

When I was running openSUSE  11.1 in previous decade, sometimes the ALSA sound diver throws an error while playing some video with VLC media player. The solution was, just to restart the ALSA sound driver by running the following command as super-user:

/etc/init.d/alsasound restart

Virtual Box boot from USB


You may want to do this for a number of reasons, you may have a bootable USB thumb-drive / USB flash drive / USB stick (whatever you call it) containing Live CD, installation image etc. before you actually use it on your computer, or may be you don’t want to use that bootable USB on your computer, whatever that case might be.

Linux

Following are the 3 different methods you could use.

Method # 1: Create a pointer to your USB

I am using Ubuntu 18.04 LTS, but it could be any Linux OS/distro/flavor. If you have a bootable USB that you want to boot your VM from, go ahead and insert it.

First you need to find the logical device for your removable USB flash drive. One way to do it is to use lshw command (ls for hardware, get it?) It is recommended that you run this command as a super-user (sudo) otherwise “your output may be incomplete or inaccurate, you should run this program as super-user” warning would be displayed, which makes sense. If you need more information on lshw, including installation and basic usage, see this project website or this article.

Here is the raw command which shots of how KDE used to look like back in the day, in openSUSE 11.1 – this was the first ever Linux distro which got me hooked with Linux. To put things in perspective, openSUSE’s current version is 15.0 😉 will output EVERYTHING:

# "sudo lshw" shows everything
$ sudo lshw -class volume -disable TEST -notime

And look for the entry associated with your hard drive’s label. Alternatively, following commands much more concise if you know what you are looking for:

$ sudo lshw -businfo -disable TEST | grep volume

In my case, from the first command above, it was /dev/sdb1.

Next, Continue reading “Virtual Box boot from USB”

Ubuntu: deploy .NET Core app


Today, I want to walk-through the steps I used to deploy ASP.NET Core website application to Ubuntu Server. ASP.NET Core supports several Linux distributions, I am using Ubuntu Server.

From a quick internet search I found 3 decent blog posts:

  1. decatechlabs.com article
  2. garywoodfine.com article
  3. blog.bobbyallen.me post

There already are many articles which talk about how to set up your development environment for .NET Core but this post starts, where they end. It’s about getting production ready.

These are to-the-point & well written. But, either these posts are more than a year old, or they are for setting up your development environment, not for production deployment. You need to install .NET Core run-time, not the .NET Core SDK (which also includes run-time). You can download it from: Continue reading “Ubuntu: deploy .NET Core app”

[4/4] Docker: Front-end development w/ Java, SpringBoot MVC & RESTful Web API


Today we are going to talk about adding front end user interface to our application, from scratch. We can add the front end to our application using something called view resolvers. Our options are Apache Tiles, JavaServer Pages (JSP), etc. there are many other options, as well. Spring Boot supports FreeMarker templates, Groovy Templates and Themyleaf via “AutoConfiguration”, as the first class citizens. As the name suggests, we should not need to do a whole lot to get going with one of these. In this video we would be looking at Thymeleaf. I find it easy to use and feature rich at the same time. Thymeleaf is mostly HTML. Finally, we will talk about Web JARs & How to add Branding to our web application, using responsive web design. Continue reading “[4/4] Docker: Front-end development w/ Java, SpringBoot MVC & RESTful Web API”

AngularJS Tools


According to a sendesignz.com post

[3/4] Docker: SpringBoot, Hibernate & Web API


In previous episodes (Part1, Part2) we saw how to create Java Maven project from scratch using SpringBoot – followed by how to deploy the application to docker and then I also demonstrated how can you set up MySQL database server, with automated initialization of a fresh database instance.

Now, let’s see how to set up RESTful Web API to display data from database using Hibernate ORM. We’ll also see how to set up a local development environment and docker deployment environment to quickly switch between the two, and establish an efficient work-flow for your project. Next, we’d set up our project to perform CRUD operations using Hibernate. And finally, we will create our Web API endpoints which can serve the requests made from the browser, in JSON format.

Let’s Continue reading “[3/4] Docker: SpringBoot, Hibernate & Web API”