Beegee Alop

12/12/2022, 10:00 AM
How much of cloud do you use for data engineering? My company uses AWS and the data engineers double downed on the cloud. We use lambdas, dynamodb, s3 (of course), kinesis, firehose, sqs, eventbridge, etc. Our airflow can invoke • lambdas • containers • databricks • emr
👀 2

Oz Katz

12/12/2022, 10:25 AM
That's a great question! I'll try and give my 2 cents: I think there are different "tiers" that we can look at. There's a difference between running e.g. Spark using: • Glue (Serverless) • EMR (Managed Servers) • EC2 + Apache Spark (just VMs) • Bare metal + apache spark (traditional hosting) Typically I'd go for the most managed approach (i.e. serverless in that case) when possible, and go to lower "tiers" only when that is required. This could be due to mee needing more flexibility and configuration, tighter security, lower cost, etc. The beauty of pay-for-what-you-use services is that evaluating and introducing new technologies becomes so much easier. OSS tools have the added benefits of being able to consume them at different tiers without being coupled to a specific implementation.
@Beegee Alop would love your thoughts on that 🙂

Beegee Alop

12/12/2022, 10:44 AM
Full agreement here. Managed services avoid or delay the need for specialized expertise that will maintain the infrastructure. In your example, you’ll need to consider hiring that expertise when using EKS or EC2s. Or else, your data engineers are paying for overhead / “yak shaving”. “Pay for what you use” also makes it quicker to realize the value of a new tech. Thanks for a great response.
🙌 1