Big data with Spark for engineers – basic workshop

Machine learning training

Prerequisites

Python syntax

Skills your team will gain

An understanding of challenges in optical character recognition problems.

Experience in creating OCR solutions using modern deep learning methods.

Duration

2 days

Agenda

Part 1

Introduction to Spark

  • Functional programming in Scala
  • MapReduce paradigm in Spark
  • Broadcasts, accumulators, caching

Part 2

Spark SQL

  • Dataframes
  • RDDs vs Dataframes vs Datasets
  • User Defined Functions

Part 2

data science in Spark – Spark MLLib

  • Managing and tracking experiments – ML Pipelines
  • Linear Regression, random Forests
  • Cross-validation and evaluation on Spark

Part 2

Spark GraphX

  • Pregel programming paradigm
  • VertexRDD, EdgeRDD
  • PageRank, Connected Components

Part 2

Spark Streaming

  • DStreams
  • Window functions, stateful operations
  • Sources and sinks

Part 2

Spark Structured Streaming

  • Streaming Datasets
  • Watermarking, windowing, stateful operations
  • Output modes, sources and sinks

Contact our training manager

Find us
  • deepsense.ai, Inc.
  • 2100 Geng Road, Suite 210
  • Palo Alto, CA 94303
  • United States of America
  • deepsense.ai Sp. z o.o.
  • Krańcowa 5
  • 02-493 Warsaw
  • Poland
Let us know how we can help
Fill out this quick form and we will contact you shortly




You can modify your privacy settings and unsubscribe from our lists at any time (see our privacy policy).