Anish C

Anish C

Software Engineer

Posts by Year

2018 1
2017 8

2018

Big Data Lake in the AWS Cloud

9 minute read

Using AWS S3 as a Big Data Lake and its alternatives

Back to Top ↑

2017

Akka Actors vs Streams for Rest APIs

11 minute read

A comparison of use cases for Spray IO (on Akka Actors) and Akka Http (on Akka Streams) for creating rest APIs

Implementing Statistical Mode in Apache Spark

8 minute read

Finding the most common value in parallel across nodes, and having that as an aggregate function.

Analyzing Java Garbage Collection Logs for debugging and optimizing Apache Spark jobs

10 minute read

Understanding how Spark runs on JVMs and how the memory is managed in each JVM.

Using HashSet based indexes in Apache Spark

14 minute read

This post is about de-duplication of data while loading to tables using HashSet based indexes in Apache Spark.

Exception Handling in Spark Data Frames

7 minute read

General Exception Handling

SQL Analytical Functions - II - Window Frames, ROWS and RANGE

2 minute read

A follow up post about specifying window frames to SQL analytical functions. This assumes you have already read my previous post where I described the use of...

SQL Analytical Functions - I - Overview, PARTITION BY and ORDER BY

6 minute read

For a long time I had faced a lot of problems while working with data bases and SQL where in order to get a better understanding of the available data, simpl...

Geo Location Batch Search in Spark

3 minute read

A module written in Scala for Apache Spark v2.0.0 to batch process mapping of Geo Locations in two skewed data sets. Link to code: https://github.com/anish74...

Back to Top ↑