The goal of the mentoring program is to create an end-to-end Spark machine learning pipeline that will run on your cluster. The pipeline will read your data, build a model and save it for production use.
The mentoring program includes:
- A dedicated mentor (an experienced data scientist)
- Eight remote sessions (twice a week)
- E-mail contact with a mentor in between the sessions
- The final report – the summary of teamwork and recommendations for further team development
- Additional materials (recommendations on the articles, books, blog posts, tools etc.)
Two remote mentoring sessions each week
- Loading data into Spark
- Cleaning your dataset
- Training and optimizing the model
- Saving and productionizing the model
- Monitoring and optimizing Spark application
- Summary and conclusions
Because the mentoring will be more effective if you work on your use case, we encourage you to use your own image dataset. However, should you choose, we can provide an interesting and challenging case to be solved along with a dataset.