Posts by Tag

A module written in Scala for Apache Spark v2.0.0 to batch process mapping of Geo Locations in two skewed data sets. Link to code: https://github.com/anish74...

Back to Top ↑

spark

Implementing Statistical Mode in Apache Spark

8 minute read

Finding the most common value in parallel across nodes, and having that as an aggregate function.

Analyzing Java Garbage Collection Logs for debugging and optimizing Apache Spark jobs

10 minute read

Understanding how Spark runs on JVMs and how the memory is managed in each JVM.

Using HashSet based indexes in Apache Spark

14 minute read

This post is about de-duplication of data while loading to tables using HashSet based indexes in Apache Spark.

Exception Handling in Spark Data Frames

7 minute read

General Exception Handling

Geo Location Batch Search in Spark

3 minute read

A module written in Scala for Apache Spark v2.0.0 to batch process mapping of Geo Locations in two skewed data sets. Link to code: https://github.com/anish74...

Back to Top ↑

join

Using HashSet based indexes in Apache Spark

14 minute read

This post is about de-duplication of data while loading to tables using HashSet based indexes in Apache Spark.

Geo Location Batch Search in Spark

3 minute read

A module written in Scala for Apache Spark v2.0.0 to batch process mapping of Geo Locations in two skewed data sets. Link to code: https://github.com/anish74...

Back to Top ↑

optimization

Using HashSet based indexes in Apache Spark

14 minute read

This post is about de-duplication of data while loading to tables using HashSet based indexes in Apache Spark.

Geo Location Batch Search in Spark

3 minute read

A module written in Scala for Apache Spark v2.0.0 to batch process mapping of Geo Locations in two skewed data sets. Link to code: https://github.com/anish74...

Back to Top ↑

sql

SQL Analytical Functions - II - Window Frames, ROWS and RANGE

2 minute read

A follow up post about specifying window frames to SQL analytical functions. This assumes you have already read my previous post where I described the use of...

SQL Analytical Functions - I - Overview, PARTITION BY and ORDER BY

6 minute read

For a long time I had faced a lot of problems while working with data bases and SQL where in order to get a better understanding of the available data, simpl...

Back to Top ↑

analytics

SQL Analytical Functions - II - Window Frames, ROWS and RANGE

2 minute read

A follow up post about specifying window frames to SQL analytical functions. This assumes you have already read my previous post where I described the use of...

SQL Analytical Functions - I - Overview, PARTITION BY and ORDER BY

6 minute read

For a long time I had faced a lot of problems while working with data bases and SQL where in order to get a better understanding of the available data, simpl...

Back to Top ↑

data-exploration

SQL Analytical Functions - II - Window Frames, ROWS and RANGE

2 minute read

A follow up post about specifying window frames to SQL analytical functions. This assumes you have already read my previous post where I described the use of...

SQL Analytical Functions - I - Overview, PARTITION BY and ORDER BY

6 minute read

For a long time I had faced a lot of problems while working with data bases and SQL where in order to get a better understanding of the available data, simpl...

Back to Top ↑

analytical-functions

SQL Analytical Functions - II - Window Frames, ROWS and RANGE

2 minute read

A follow up post about specifying window frames to SQL analytical functions. This assumes you have already read my previous post where I described the use of...

SQL Analytical Functions - I - Overview, PARTITION BY and ORDER BY

6 minute read

For a long time I had faced a lot of problems while working with data bases and SQL where in order to get a better understanding of the available data, simpl...

Back to Top ↑

aws

Big Data Lake in the AWS Cloud

9 minute read

Using AWS S3 as a Big Data Lake and its alternatives

Analyzing Java Garbage Collection Logs for debugging and optimizing Apache Spark jobs

10 minute read

Understanding how Spark runs on JVMs and how the memory is managed in each JVM.

Back to Top ↑

s3

Big Data Lake in the AWS Cloud

9 minute read

Using AWS S3 as a Big Data Lake and its alternatives

Analyzing Java Garbage Collection Logs for debugging and optimizing Apache Spark jobs

10 minute read

Understanding how Spark runs on JVMs and how the memory is managed in each JVM.

Back to Top ↑

mapreduce

Geo Location Batch Search in Spark

3 minute read

A module written in Scala for Apache Spark v2.0.0 to batch process mapping of Geo Locations in two skewed data sets. Link to code: https://github.com/anish74...

Back to Top ↑

spacial

Geo Location Batch Search in Spark

3 minute read

A module written in Scala for Apache Spark v2.0.0 to batch process mapping of Geo Locations in two skewed data sets. Link to code: https://github.com/anish74...

Back to Top ↑

data-engineering

Exception Handling in Spark Data Frames

7 minute read

General Exception Handling

Back to Top ↑

data-frames

Exception Handling in Spark Data Frames

7 minute read

General Exception Handling

Back to Top ↑

data-errors

Exception Handling in Spark Data Frames

7 minute read

General Exception Handling

Back to Top ↑

indexes

Using HashSet based indexes in Apache Spark

14 minute read

This post is about de-duplication of data while loading to tables using HashSet based indexes in Apache Spark.

Back to Top ↑

hashset

Using HashSet based indexes in Apache Spark

14 minute read

This post is about de-duplication of data while loading to tables using HashSet based indexes in Apache Spark.

Back to Top ↑

parquet

Analyzing Java Garbage Collection Logs for debugging and optimizing Apache Spark jobs

10 minute read

Understanding how Spark runs on JVMs and how the memory is managed in each JVM.

Back to Top ↑

java-gc

Analyzing Java Garbage Collection Logs for debugging and optimizing Apache Spark jobs

10 minute read

Understanding how Spark runs on JVMs and how the memory is managed in each JVM.

Back to Top ↑

udaf

Implementing Statistical Mode in Apache Spark

8 minute read

Finding the most common value in parallel across nodes, and having that as an aggregate function.

Back to Top ↑

statistics

Implementing Statistical Mode in Apache Spark

8 minute read

Finding the most common value in parallel across nodes, and having that as an aggregate function.

Back to Top ↑

functions

Implementing Statistical Mode in Apache Spark

8 minute read

Finding the most common value in parallel across nodes, and having that as an aggregate function.

Back to Top ↑

akka

Akka Actors vs Streams for Rest APIs

11 minute read

A comparison of use cases for Spray IO (on Akka Actors) and Akka Http (on Akka Streams) for creating rest APIs

Back to Top ↑

akka-actors

Akka Actors vs Streams for Rest APIs

11 minute read

A comparison of use cases for Spray IO (on Akka Actors) and Akka Http (on Akka Streams) for creating rest APIs

Back to Top ↑

akka-streams

Akka Actors vs Streams for Rest APIs

11 minute read

A comparison of use cases for Spray IO (on Akka Actors) and Akka Http (on Akka Streams) for creating rest APIs

Back to Top ↑

akka-http

Akka Actors vs Streams for Rest APIs

11 minute read

A comparison of use cases for Spray IO (on Akka Actors) and Akka Http (on Akka Streams) for creating rest APIs

Back to Top ↑

spray-io

Akka Actors vs Streams for Rest APIs

11 minute read

A comparison of use cases for Spray IO (on Akka Actors) and Akka Http (on Akka Streams) for creating rest APIs

Back to Top ↑

data

Big Data Lake in the AWS Cloud

9 minute read

Using AWS S3 as a Big Data Lake and its alternatives

Back to Top ↑

data-lake

Big Data Lake in the AWS Cloud

9 minute read

Using AWS S3 as a Big Data Lake and its alternatives

Back to Top ↑

cloud

Big Data Lake in the AWS Cloud

9 minute read

Using AWS S3 as a Big Data Lake and its alternatives

Back to Top ↑

system-design

Big Data Lake in the AWS Cloud

9 minute read

Using AWS S3 as a Big Data Lake and its alternatives

Back to Top ↑

big-data

Big Data Lake in the AWS Cloud

9 minute read

Using AWS S3 as a Big Data Lake and its alternatives

Back to Top ↑

gcs

Big Data Lake in the AWS Cloud

9 minute read

Using AWS S3 as a Big Data Lake and its alternatives

Back to Top ↑

gcp

Big Data Lake in the AWS Cloud

9 minute read

Using AWS S3 as a Big Data Lake and its alternatives

Back to Top ↑

Anish C

Posts by Tag

scala

spark

join

optimization

sql

analytics

data-exploration

analytical-functions

aws

s3

mapreduce

spacial

data-engineering

data-frames

data-errors

indexes

hashset

parquet

java-gc

udaf

statistics

functions

akka

akka-actors

akka-streams

akka-http

spray-io

data

data-lake

cloud

system-design

big-data

gcs

gcp