The podcast about Python and the people who make it great
This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.
D
Data Engineering Podcast


1
Unlocking The Potential Of Streaming Data Applications Without The Operational Headache At Grainite
1:13:33
1:13:33
Play later
Play later
Lists
Like
Liked
1:13:33
Summary The promise of streaming data is that it allows you to react to new information as it happens, rather than introducing latency by batching records together. The peril is that building a robust and scalable streaming architecture is always more complicated and error-prone than you think it's going to be. After experiencing this unfortunate r…
D
Data Engineering Podcast


1
Aligning Data Security With Business Productivity To Deploy Analytics Safely And At Speed
51:38
51:38
Play later
Play later
Lists
Like
Liked
51:38
Summary As with all aspects of technology, security is a critical element of data applications, and the different controls can be at cross purposes with productivity. In this episode Yoav Cohen from Satori shares his experiences as a practitioner in the space of data security and how to align with the needs of engineers and business users. He also …
D
Data Engineering Podcast


1
Use Your Data Warehouse To Power Your Product Analytics With NetSpring
49:21
49:21
Play later
Play later
Lists
Like
Liked
49:21
Summary With the rise of the web and digital business came the need to understand how customers are interacting with the products and services that are being sold. Product analytics has grown into its own category and brought with it several services with generational differences in how they approach the problem. NetSpring is a warehouse-native pro…
D
Data Engineering Podcast


1
Exploring The Nuances Of Building An Intentional Data Culture
45:44
45:44
Play later
Play later
Lists
Like
Liked
45:44
Summary The ecosystem for data professionals has matured to the point that there are a large and growing number of distinct roles. With the scope and importance of data steadily increasing it is important for organizations to ensure that everyone is aligned and operating in a positive environment. To help facilitate the nascent conversation about w…
Summary There has been a lot of discussion about the practical application of data mesh and how to implement it in an organization. Jean-Georges Perrin was tasked with designing a new data platform implementation at PayPal and wound up building a data mesh. In this episode he shares that journey and the combination of technical and organizational c…
D
Data Engineering Podcast


1
The View Below The Waterline Of Apache Iceberg And How It Fits In Your Data Lakehouse
55:06
55:06
Play later
Play later
Lists
Like
Liked
55:06
Summary Cloud data warehouses have unlocked a massive amount of innovation and investment in data applications, but they are still inherently limiting. Because of their complete ownership of your data they constrain the possibilities of what data you can store and how it can be used. Projects like Apache Iceberg provide a viable alternative in the …
D
Data Engineering Podcast


1
Let The Whole Team Participate In Data With The Quilt Versioned Data Hub
52:02
52:02
Play later
Play later
Lists
Like
Liked
52:02
Summary Data is a team sport, but it's often difficult for everyone on the team to participate. For a long time the mantra of data tools has been "by developers, for developers", which automatically excludes a large portion of the business members who play a crucial role in the success of any data project. Quilt Data was created as an answer to mak…
D
Data Engineering Podcast


1
Reflecting On The Past 6 Years Of Data Engineering
32:21
32:21
Play later
Play later
Lists
Like
Liked
32:21
Summary This podcast started almost exactly six years ago, and the technology landscape was much different than it is now. In that time there have been a number of generational shifts in how data engineering is done. In this episode I reflect on some of the major themes and take a brief look forward at some of the upcoming changes. Announcements He…
D
Data Engineering Podcast


1
Let Your Business Intelligence Platform Build The Models Automatically With Omni Analytics
50:43
50:43
Play later
Play later
Lists
Like
Liked
50:43
Summary Business intelligence has gone through many generational shifts, but each generation has largely maintained the same workflow. Data analysts create reports that are used by the business to understand and direct the business, but the process is very labor and time intensive. The team at Omni have taken a new approach by automatically buildin…
D
Data Engineering Podcast


1
Safely Test Your Applications And Analytics With Production Quality Data Using Tonic AI
45:40
45:40
Play later
Play later
Lists
Like
Liked
45:40
Summary The most interesting and challenging bugs always happen in production, but recreating them is a constant challenge due to differences in the data that you are working with. Building your own scripts to replicate data from production is time consuming and error-prone. Tonic is a platform designed to solve the problem of having reliable, prod…
D
Data Engineering Podcast


1
Building Applications With Data As Code On The DataOS
48:36
48:36
Play later
Play later
Lists
Like
Liked
48:36
Summary The modern data stack has made it more economical to use enterprise grade technologies to power analytics at organizations of every scale. Unfortunately it has also introduced new overhead to manage the full experience as a single workflow. At the Modern Data Company they created the DataOS platform as a means of driving your full analytics…
D
Data Engineering Podcast


1
Automate Your Pipeline Creation For Streaming Data Transformations With SQLake
44:05
44:05
Play later
Play later
Lists
Like
Liked
44:05
Summary Managing end-to-end data flows becomes complex and unwieldy as the scale of data and its variety of applications in an organization grows. Part of this complexity is due to the transformation and orchestration of data living in disparate systems. The team at Upsolver is taking aim at this problem with the latest iteration of their platform …
D
Data Engineering Podcast


1
Increase Your Odds Of Success For Analytics And AI Through More Effective Knowledge Management With AlignAI
59:21
59:21
Play later
Play later
Lists
Like
Liked
59:21
Summary Making effective use of data requires proper context around the information that is being used. As the size and complexity of your organization increases the difficulty of ensuring that everyone has the necessary knowledge about how to get their work done scales exponentially. Wikis and intranets are a common way to attempt to solve this pr…
D
Data Engineering Podcast


1
Using Product Driven Development To Improve The Productivity And Effectiveness Of Your Data Teams
58:45
58:45
Play later
Play later
Lists
Like
Liked
58:45
Summary With all of the messaging about treating data as a product it is becoming difficult to know what that even means. Vishal Singh is the head of products at Starburst which means that he has to spend all of his time thinking and talking about the details of product thinking and its application to data. In this episode he shares his thoughts on…
D
Data Engineering Podcast


1
Simple And Scalable Encryption Of Data In Use For Analytics And Machine Learning With Opaque Systems
1:08:25
1:08:25
Play later
Play later
Lists
Like
Liked
1:08:25
Summary Encryption and security are critical elements in data analytics and machine learning applications. We have well developed protocols and practices around data that is at rest and in motion, but security around data in use is still severely lacking. Recognizing this shortcoming and the capabilities that could be unlocked by a robust solution …
D
Data Engineering Podcast


1
An Exploration Of Tobias' Experience In Building A Data Lakehouse From Scratch
1:11:59
1:11:59
Play later
Play later
Lists
Like
Liked
1:11:59
Summary Five years of hosting the Data Engineering Podcast has provided Tobias Macey with a wealth of insight into the work of building and operating data systems at a variety of scales and for myriad purposes. In order to condense that acquired knowledge into a format that is useful to everyone Scott Hirleman turns the tables in this episode and a…
D
Data Engineering Podcast


1
Revisit The Fundamental Principles Of Working With Data To Avoid Getting Caught In The Hype Cycle
1:05:29
1:05:29
Play later
Play later
Lists
Like
Liked
1:05:29
Summary The data ecosystem has seen a constant flurry of activity for the past several years, and it shows no signs of slowing down. With all of the products, techniques, and buzzwords being discussed it can be easy to be overcome by the hype. In this episode Juan Sequeda and Tim Gasper from data.world share their views on the core principles that …
D
Data Engineering Podcast


1
Making Sense Of The Technical And Organizational Considerations Of Data Contracts
47:00
47:00
Play later
Play later
Lists
Like
Liked
47:00
Summary One of the reasons that data work is so challenging is because no single person or team owns the entire process. This introduces friction in the process of collecting, processing, and using data. In order to reduce the potential for broken pipelines some teams have started to adopt the idea of data contracts. In this episode Abe Gong brings…
D
Data Engineering Podcast


1
Convert Your Unstructured Data To Embedding Vectors For More Efficient Machine Learning With Towhee
53:45
53:45
Play later
Play later
Lists
Like
Liked
53:45
Preamble This is a cross-over episode from our new show The Machine Learning Podcast, the show about going from idea to production with machine learning. Summary Data is one of the core ingredients for machine learning, but the format in which it is understandable to humans is not a useful representation for models. Embedding vectors are a way to s…
T
The Python Podcast.__init__


1
Update Your Model's View Of The World In Real Time With Streaming Machine Learning Using River
1:16:22
1:16:22
Play later
Play later
Lists
Like
Liked
1:16:22
Preamble This is a cross-over episode from our new show The Machine Learning Podcast, the show about going from idea to production with machine learning. Summary The majority of machine learning projects that you read about or work on are built around batch processes. The model is trained, and then validated, and then deployed, with each step being…
D
Data Engineering Podcast


1
Run Your Applications Worldwide Without Worrying About The Database With Planetscale
49:40
49:40
Play later
Play later
Lists
Like
Liked
49:40
Summary One of the most critical aspects of software projects is managing its data. Managing the operational concerns for your database can be complex and expensive, especially if you need to scale to large volumes of data, high traffic, or geographically distributed usage. Planetscale is a serverless option for your MySQL workloads that lets you f…
D
Data Engineering Podcast


1
Business Intelligence In The Palm Of Your Hand With Zing Data
46:46
46:46
Play later
Play later
Lists
Like
Liked
46:46
Summary Business intelligence is the foremost application of data in organizations of all sizes. The typical conception of how it is accessed is through a web or desktop application running on a powerful laptop. Zing Data is building a mobile native platform for business intelligence. This opens the door for busy employees to access and analyze the…
D
Data Engineering Podcast


1
Adopting Real-Time Data At Organizations Of Every Size
50:24
50:24
Play later
Play later
Lists
Like
Liked
50:24
Summary The term "real-time data" brings with it a combination of excitement, uncertainty, and skepticism. The promise of insights that are always accurate and up to date is appealing to organizations, but the technical realities to make it possible have been complex and expensive. In this episode Arjun Narayan explains how the technical barriers t…
T
The Python Podcast.__init__


1
Declarative Machine Learning For High Performance Deep Learning Models With Predibase
59:22
59:22
Play later
Play later
Lists
Like
Liked
59:22
Preamble This is a cross-over episode from our new show The Machine Learning Podcast, the show about going from idea to production with machine learning. Summary Deep learning is a revolutionary category of machine learning that accelerates our ability to build powerful inference models. Along with that power comes a great deal of complexity in det…
T
The Python Podcast.__init__


1
Build Better Machine Learning Models With Confidence By Adding Validation With Deepchecks
47:36
47:36
Play later
Play later
Lists
Like
Liked
47:36
Preamble This is a cross-over episode from our new show The Machine Learning Podcast, the show about going from idea to production with machine learning. Summary Machine learning has the potential to transform industries and revolutionize business capabilities, but only if the models are reliable and robust. Because of the fundamental probabilistic…
D
Data Engineering Podcast


1
Supporting And Expanding The Arrow Ecosystem For Fast And Efficient Data Processing At Voltron Data
50:25
50:25
Play later
Play later
Lists
Like
Liked
50:25
Summary The data ecosystem has been growing rapidly, with new communities joining and bringing their preferred programming languages to the mix. This has led to inefficiencies in how data is stored, accessed, and shared across process and system boundaries. The Arrow project is designed to eliminate wasted effort in translating between languages, a…
D
Data Engineering Podcast


1
Analyze Massive Data At Interactive Speeds With The Power Of Bitmaps Using FeatureBase
59:24
59:24
Play later
Play later
Lists
Like
Liked
59:24
Summary The most expensive part of working with massive data sets is the work of retrieving and processing the files that contain the raw information. FeatureBase (formerly Pilosa) avoids that overhead by converting the data into bitmaps. In this episode Matt Jaffee explains how to model your data as bitmaps and the benefits that this representatio…
T
The Python Podcast.__init__


1
Build A Full Stack ML Powered App In An Afternoon With Baseten
45:22
45:22
Play later
Play later
Lists
Like
Liked
45:22
Preamble This is a cross-over episode from our new show The Machine Learning Podcast, the show about going from idea to production with machine learning. Summary Building an ML model is getting easier than ever, but it is still a challenge to get that model in front of the people that you built it for. Baseten is a platform that helps you quickly g…
D
Data Engineering Podcast


1
A Look At The Data Systems Behind The Gameplay For League Of Legends
1:01:29
1:01:29
Play later
Play later
Lists
Like
Liked
1:01:29
Summary The majority of blog posts and presentations about data engineering and analytics assume that the consumers of those efforts are internal business users accessing an environment controlled by the business. In this episode Ian Schweer shares his experiences at Riot Games supporting player-focused features such as machine learning models and …
D
Data Engineering Podcast


1
Tame The Entropy In Your Data Stack And Prevent Failures With Sifflet
46:46
46:46
Play later
Play later
Lists
Like
Liked
46:46
Summary The problems that are easiest to fix are the ones that you prevent from happening in the first place. Sifflet is a platform that brings your entire data stack into focus to improve the reliability of your data assets and empower collaboration across your teams. In this episode CEO and founder Salma Bakouk shares her views on the causes and …
D
Data Engineering Podcast


1
Taking A Look Under The Hood At CreditKarma's Data Platform
52:02
52:02
Play later
Play later
Lists
Like
Liked
52:02
Summary CreditKarma builds data products that help consumers take advantage of their credit and financial capabilities. To make that possible they need a reliable data platform that empowers all of the organization’s stakeholders. In this episode Vishnu Venkataraman shares the journey that he and his team have taken to build and evolve their system…
D
Data Engineering Podcast


1
Build Data Products Without A Data Team Using AgileData
1:12:29
1:12:29
Play later
Play later
Lists
Like
Liked
1:12:29
Summary Building data products is an undertaking that has historically required substantial investments of time and talent. With the rise in cloud platforms and self-serve data technologies the barrier of entry is dropping. Shane Gibson co-founded AgileData to make analytics accessible to companies of all sizes. In this episode he explains the desi…
T
The Python Podcast.__init__


1
Skip Straight To The Fun Part Of Your Project With PyScaffold
57:46
57:46
Play later
Play later
Lists
Like
Liked
57:46
Summary Starting a new project is always exciting and full of possibility, until you have to set up all of the repetitive boilerplate. Fortunately there are useful project templates that eliminate that drudgery. PyScaffold goes above and beyond simple template repositories, and gives you a toolkit for different application types that are packed wit…