Optimize Spark with DISTRIBUTE BY & CLUSTER BY

Optimize Spark with DISTRIBUTE BY & CLUSTER BY

Distribute by and cluster by clauses are really cool features in SparkSQL. Unfortunately, this subjectremains relatively unknown to most users – this post aims to change that.

CodiLime co-founder and CEO wins Poland’s top business award for Vision and Innovation

CodiLime co-founder and CEO wins Poland’s top business award for Vision and Innovation

Tomasz Kulakowski, co-founder and CEO of CodiLime, the sole investor in the Big Data science company deepsense.io, is the winner of Polish Business Roundtable’s 2016 Vision and Innovation Award. PRB’s Jan Wejchert Awards are the most prestigious prizes in the Polish business community.

US Baby Names - Data Visualization

US Baby Names – Data Visualization

A few days ago we released Seahorse 1.1, an enhanced version of our machine learning, Big Data manipulation and visualization product. Today, we will show you how the new version of Seahorse can be used for data mining and data visualization.

deepsense.io launches Seahorse 1.1 at Hadoop Summit Europe 2016 in Dublin

deepsense.io launches Seahorse 1.1 at Hadoop Summit Europe 2016 in Dublin

The latest version of Seahorse, deepsense.io’s flagship Big Data product, adds new features and improved UI.

deepsense.io presents deep learning and Big Data accomplishments at GTC and Hadoop Summit

deepsense.io presents deep learning and Big Data accomplishments at GTC and Hadoop Summit

deepsense.io experts to present Big Data and deep learning accomplishments at Silicon Valley and Dublin conferences.

Improve Apache Spark aggregate performance with batching

Improve Apache Spark aggregate performance with batching

Seahorse provides users with reports on their data at every step in the workflow. A user can view reports after each operation to review the intermediate results. In our reports we provide users with distributions for columns in the form of a histogram for continuous data, and a pie chart for categorical data.

AAIA'16 Data Mining Challenge Winning

AAIA’16 Data Mining Challenge Winning

deepsense.io tops global competition for predicting dangerous seismic events in active coal mines.

Should I eat this mushroom?

Should I eat this mushroom?

A few days ago we have released Seahorse 1.0, a visual platform for machine learning and Big Data manipulation available for all, for free! Today, we show you how to use Seahorse to solve a simple classification problem.

deepsense.io to Unveil Seahorse 1.0 at Spark Summit East 2016

deepsense.io to Unveil Seahorse 1.0 at Spark Summit East 2016

New product version and corporate workshop series target world’s premiere big data gathering of Apache Spark professionals.

Fast and accurate categorical distribution without reshuffling in Apache Spark

In Seahorse we want to provide our users with accurate distributions for their categorical data. Categorical data can be thought of as possible results of an observation that can take one of K possible outcomes. Some examples: Nationality, Marital Status, Gender, Type of Education.