Big data with Spark for engineers – basic workshop

Machine learning training


Python syntax

Skills your team will gain

An understanding of challenges in optical character recognition problems.

Experience in creating OCR solutions using modern deep learning methods.


2 days


Part 1

Introduction to Spark

  • Functional programming in Scala
  • MapReduce paradigm in Spark
  • Broadcasts, accumulators, caching

Part 2

Spark SQL

  • Dataframes
  • RDDs vs Dataframes vs Datasets
  • User Defined Functions

Part 2

data science in Spark – Spark MLLib

  • Managing and tracking experiments – ML Pipelines
  • Linear Regression, random Forests
  • Cross-validation and evaluation on Spark

Part 2

Spark GraphX

  • Pregel programming paradigm
  • VertexRDD, EdgeRDD
  • PageRank, Connected Components

Part 2

Spark Streaming

  • DStreams
  • Window functions, stateful operations
  • Sources and sinks

Part 2

Spark Structured Streaming

  • Streaming Datasets
  • Watermarking, windowing, stateful operations
  • Output modes, sources and sinks

Contact us

The administrator of the personal data provided by you in the registration form is sp. z o.o., headquartered at al. Jerozolimskie 44, 00-024 Warsaw, Poland. Your personal data will be processed for the purpose of directing marketing content to you.
Detailed information about the processing of your personal data, including your rights, can be found in our privacy policy.
* This consent is required to receive email communication from sp. z o.o. regarding the company and its offerings.
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
  •, Inc.
  • 2100 Geng Road, Suite 210
  • Palo Alto, CA 94303
  • United States of America
  • Sp. z o.o.
  • al. Jerozolimskie 44
  • 00-024 Warsaw
  • Poland
  • ul. Łęczycka 59
  • 85-737 Bydgoszcz
  • Poland
Let us know how we can help