Name: Building Robust Streaming Data Pipelines with Apache Spark - Zak Hassan, Red Hat
Start: 2017-10-25T17:05:00+0200
End: 2017-10-25T17:45:00+0200

October 23-26, 2017 - Prague, Czech Republic
Click Here For Information & Registration

Back To Schedule

Building Robust Streaming Data Pipelines with Apache Spark - Zak Hassan, Red Hat

Feedback form is now closed.

There are challenges to architecting a solution that will allow for developers to stream data into Kafka and be able to manage dirty data which is always an issue in ETL pipelines. I'd like to share lessons learned and demonstrate how we can put Apache Kafka, Apache Spark and Apache Camel together to provide developers with a continuous data pipeline for the Spark applications. Without data it is very difficult to take advantage of its full capabilities of Spark. Companies sometimes have their data stored in many different systems and Apache Camel allows developers to Extract, Transform and Load their data to many systems Apache Kafka is one example. Apache Kafka is great for aggregating data in a centralized location and Apache Spark already comes with a built in connector to connect to Kafka. I'll also be explaining lessons learned from running these technologies inside docker.

Speakers

Zak Hassan

Senior Software Engineer - AI/ML CoE, CTO Office, Red Hat Inc.

Currently focused on developing analytics platform on OpenShift and leveraging Open Source ML Frameworks: Apache Spark, Tensorflow and more. Designing high performance and scalable ML platform that exposes metrics through cloud-native technology: Prometheus and Kubernetes.

Wednesday October 25, 2017 17:05 - 17:45 CEST
Rokoska

CloudOpen Tracks

Attendees (55)

J
N
A
D
r
M
x
h
P
S
Y
A
J
S
U
D
E
View All →

Open Source Summit Europe + ELC Europe 2017

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Zak Hassan

Attendees (55)