Contact us
Locations
United States of America
- deepsense.ai, Inc.
- 2100 Geng Road, Suite 210
- Palo Alto, CA 94303
- United States of America
Poland
- deepsense.ai Sp. z o.o.
- al. Jerozolimskie 44
- 00-024 Warsaw
- Poland
- ul. Łęczycka 59
- 85-737 Bydgoszcz
- Poland
Let us know how we can help
- Our service offerings
- contact@deepsense.ai
- Media relations
- media@deepsense.ai
Improve Apache Spark aggregate performance with batching
/in Big data & Spark, Seahorse /by Adam JakubowskiSeahorse provides users with reports on their data at every step in the workflow. A user can view reports after each operation to review the intermediate results. In our reports we provide users with distributions for columns in the form of a histogram for continuous data, and a pie chart for categorical data.
Should I eat this mushroom?
/in Big data & Spark, Seahorse /by Grzegorz ChilkiewiczA few days ago we have released Seahorse 1.0, a visual platform for machine learning and Big Data manipulation available for all, for free! Today, we show you how to use Seahorse to solve a simple classification problem.
Fast and accurate categorical distribution without reshuffling in Apache Spark
/in Big data & Spark, Seahorse /by Adam JakubowskiIn Seahorse we want to provide our users with accurate distributions for their categorical data. Categorical data can be thought of as possible results of an observation that can take one of K possible outcomes. Some examples: Nationality, Marital Status, Gender, Type of Education.
Cooperative data exploration
/in Big data & Spark /by Piotr ŁusakowskiLiving in a world of big data comes with a certain challenge. Namely, how to extract value from this ever-growing flow of information that comes our way. There are a lot of great tools that can help us, but they all require a lot of resources. So, how do we ease the burden on this CPU/RAM demand? One way to do it is to share the data we are working on and results of our computations with others.
Exploration of data from iPhone motion coprocessor (2)
/in Data science /by Przemyslaw BiecekLast week we have downloaded and loaded into R data from fitness tracker (motion coprocessor in iphone). Then with just few lines of R code we decomposed the data into a seasonal weekly component and the trend. Today we are going to see how to plot the number of steps per hour for different days of week. And then same data will be used to check how often there was any activity at given time.
Which whale is it, anyway? Face recognition for right whales using deep learning
/in Data science, Deep learning, Machine learning /by Robert BoguckiRight Whale Recognition was a computer vision competition organized by the NOAA Fisheries on the Kaggle.com data science platform. Our machine learning team at deepsense.ai has finished 1st! In this post we describe our solution.
Exploration of data from iPhone motion coprocessor
/in Data science /by Przemyslaw BiecekDuring the Christmas break I met my brother-in-law who is an ultimate gadgeteer (an excellent trait for brother). He told me that most iPhones have build-in motion coprocessor and by default they are counting steps. No need to turn on anything, it is working all the time (assuming that the phone is with you).
How to create a new geom for ggplot2
/in Data science /by Przemyslaw BiecekThe new version of the ggplot2 package (v 2.0.0) will be available on CRAN in a few days.
It has a very nice mechanism for adding new geoms and stats.
Hack the Proton
/in Data science /by Przemyslaw BiecekI’ve prepared a short console-based data-driven R game named ,,The Proton Game’’ or ,,Hack the Proton” (still cannot decide which name is better). The goal of a player is to play the hacker and infiltrate Slawomir Pietraszko’s account on a Proton server. To do this, you have to solve four data-based puzzles.
eXtreme Gradient Boosting vs Random Forest [and the caret package for R]
/in Data science /by Przemyslaw BiecekDecision trees are cute.
It is easy to visualize them, easy to explain, easy to apply and even easy to construct.
Unfortunately they are quite unstable, particularly for large sets of correlated features.