Big data with Spark for data scientists

Machine learning training


Python syntax

Skills your team will gain

An understanding of challenges in optical character recognition problems.

Experience in creating OCR solutions using modern deep learning methods.


1 day


Part 1

Introduction to Spark

  • MapReduce paradigm in Spark
  • Broadcasts and accumulators
  • Caching

Part 2

Spark SQL

  • Dataframes
  • RDDs vs Dataframes vs Datasets
  • User-defined functions

Part 2

data science in Spark – Spark MLLib

  • Machine learning pipelines
  • Data preparation
  • TLinear and logistic regression
  • Random forests
  • Evaluation, cross validation

Contact us

The administrator of the personal data provided by you in the registration form is sp. z o.o., headquartered at al. Jerozolimskie 44, 00-024 Warsaw, Poland. Your personal data will be processed for the purpose of directing marketing content to you.
Detailed information about the processing of your personal data, including your rights, can be found in our privacy policy.
* This consent is required to receive email communication from sp. z o.o. regarding the company and its offerings.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
  •, Inc.
  • 2100 Geng Road, Suite 210
  • Palo Alto, CA 94303
  • United States of America
  • Sp. z o.o.
  • al. Jerozolimskie 44
  • 00-024 Warsaw
  • Poland
  • ul. Łęczycka 59
  • 85-737 Bydgoszcz
  • Poland
Let us know how we can help