By Data Engineers for Data Engineers!

This is not a site for professional teachers, talkers, paper engineers and those how are experts in tech but never built anything. We are on the front lines daily building data pipelines, streaming apps, big data analytics. We simply share the code, tutorials, tips and tricks that actually work in the field, not on paper.

Contact us for info, questions or feedback on articles and as always feel free to contribute code or articles.

Joe Trite
2298 Software

Tech Articles

Build a Custom GPT Model

Learn How to Create a ChatGPT Model for Your Company

Change Data Capture with Apache Spark

Many team give up on Change Data Capture (CDC) on Big Data due to complexity. This tutorial provides a simple pattern to reliably capture changes

How much do you pay for a line of code?

Have you every been asked why it takes so long to complete or provide an accurate estimate of an IT Project?

Upserting S3 Data Using Hudi & PySpark

Apache Hudi is a powerful framework for managing storage of large datasets like those typically found in Amazon S3.

Synchronizing s3 Data with Hudi

How to create a large, multi-tenant, enterprise grade Hadoop Data Lake

Hadoop to AWS Migration Plan

Learn how to migrate an enterprise hadoop platform to AWS

Data Engineering Best Practices

A list of best practices taken from 20+ years working in IT.

Dockerize HTTP Service Latency Monitor

HTTP Service monitor that captures performance statistics of a provided list of URLs

Infrastructure as Diagram

Many shops are adopting Infrastructure as Code (IaC), but what if you could skip the code and create your infrastructure from a diagram?

Enterprise Hadoop

How to create a large, multi-tenant, enterprise grade Hadoop Data Lake

Spark Performance Guide

A guide on common performance problems faced by engineers and how to fix them.

Setting Up a Confluent Kafka Docker Cluster Tutorial

Learn how to create a full Confluent Kafka cluster on your laptop using docker

PySpark Structured Streaming with Confluent Kafka

Learn how to stream data with Confluent Kafka cluster using PySpark

Spark on a Windows

Learn how to setup Apache Spark on a Windows computer

Load Forecasting with Prophet

Load Forecasting example using Facebook's prophet framework