GeoJson Operations in Apache Spark with Seahorse SDK

A few days ago we released Seahorse 1.4, an enhanced version of our machine learning, Big Data manipulation and data visualization product. This release also comes with an SDK – a Scala toolkit for creating new custom operations to be used in Seahorse. As a showcase, we will create a custom Geospatial operation with GeoJson […]

Scheduling Spark jobs in Seahorse

In the latest Seahorse release we introduced the scheduling of Spark jobs. We will show you how to use it to regularly collect data and send reports generated from that data via email. Use case Let’s say that we have a local meteo station and the data from this station is uploaded automatically to Google […]

R Notebook and Custom R Operations in the new Seahorse release

Presenting new features in Seahorse, Release 1.3 – custom operations in R and enhanced data exploration capabilities in an R Notebook.

US Baby Names – Data Visualization

A few days ago we released Seahorse 1.1, an enhanced version of our machine learning, Big Data manipulation and visualization product. Today, we will show you how the new version of Seahorse can be used for data mining and data visualization.

Improve Apache Spark aggregate performance with batching

Seahorse provides users with reports on their data at every step in the workflow. A user can view reports after each operation to review the intermediate results. In our reports we provide users with distributions for columns in the form of a histogram for continuous data, and a pie chart for categorical data.

Should I eat this mushroom?

A few days ago we have released Seahorse 1.0, a visual platform for machine learning and Big Data manipulation available for all, for free! Today, we show you how to use Seahorse to solve a simple classification problem.

Fast and accurate categorical distribution without reshuffling in Apache Spark

In Seahorse we want to provide our users with accurate distributions for their categorical data. Categorical data can be thought of as possible results of an observation that can take one of K possible outcomes. Some examples: Nationality, Marital Status, Gender, Type of Education.