Data Pipelines Overview

See the full post here ➡️ https://ubuntu.com//blog/data-pipelines-overview

Excerpt below:

A Data Pipeline is a series of processes that collect raw data from various sources, filter the disqualified data, transform them into the appropriate format, move them to the places you want to store them, analyze them and finally present them to your audience.

As we can see from this chart, a data pipeline is analogous to a water flow: data flows from one stage to another while being processed and reshaped. And in some cases, data will be needed to loop back to previous stages or be processed multiple times in the same stage.

For example, a data pipeline may ingest log data from thousands of drones running on Ubuntu Core or ROS. Those logs could be written to Google Cloud Storage. You can then create a SQL database on your Virtual Machines. Ubuntu Pro on Google Cloud will be a good operating system to host your PostgreSQL, the relational database management system that provides SQL querying language. Ubuntu Pro will ensure your PostgreSQL server gets security updates. These two command lines in Google Cloud Shell will help you launch a PostgreSQL server on Google Cloud:

$gcloud compute instances create [YOUR_MACHINE_NAME] –zone=[YOUR_ZONE] –machine-type=[YOUR_MACHINE_TYPE] –image=projects/ubuntu-os-pro-cloud/global/images/ubuntu-pro-2004-focal-v20210720

SSH into the machine:

$sudo apt install postgresql postgresql-contrib

Let’s dive deeper into these four stages of the data pipeline: Ingestion, Transformation, Storage, and Analysis.

Ingestion

Ingestion is the process of bringing data into your working environment. You could either ingest all your data

...


Click >>> here <<< to share your news for free!

About Linux Chatter

Linux Chatter is a news aggregator service that curates some of the best Linux, Cloud, Technical Guides, Hardware, and Security news. We display just enough content from the original post to spark your interest. If you like the topic, click on the 'read full post' button to visit the author's website. Then, use Linux Chatter to find content from amazing authors!

Why should you share your news?

Contributing is one of the best ways to promote a website. This technique has been used for decades now and is still very effective. But, this strategy can make or break your rankings depending on its application.

A news website is one of the best places to publish your blog. This is because such sites always have massive amounts of targeted traffic. If you write quality content, your post will get lots of hits, and many people will follow your blog.

Disclaimer

The content provided has been modified and is not displayed as intended by the author. Any trademarks, copyrights, and rights remain with the source. Linux Chatter sources content from RSS feeds and personal content submissions. The views and opinions expressed in these articles are those of the authors and do not necessarily reflect Linux Chatter.