Database Data Warehouses Data Lakes and Data Marts & their Imapcts on Carbon footprints CO2
Data Science

Database Data Warehouses Data Lakes and Data Marts & their Imapcts on Carbon footprints CO2

The buzzword in today's world is Data. Data is being created all around us from the time we wake up till we go to bed. 

Ever wonder how the trillions of Data generated are stored and managed? With the inflow of humongous Data comes the challenges of managing the Data efficiently. Data centers support us in resolving the problem by enabling us to retrieve meaningful information and use the data when necessary.

Let's dive into knowing how the entire process takes place and what role we have to contribute as Users and generators of the data. While handling Data I am sure you must have undoubtedly come across terms like

  1. Databases
  2. Data Warehouses
  3. Data Lakes
  4. Data Marts 

For the benefit of newcomers and those who are already familiar with this jargon, let's define each of these terms.

 

1. Databases and DBMS (Database Management Systems): 

 A Database is a tool where we can store the collection of organized data that is Structured. It makes it easily accessible, manageable, updated, and retrievable electronically on a computer system.

Advantages of Databases:

  • Minimum data redundancy.
  • Improved data security.
  • Increased consistency.
  • Lower updating errors.
  • Cost reduction for data entry, data storage, and data retrieval.
  • Enhanced data access via host and query languages.
  • Higher application program data integrity.

A database management system (DBMS) often has control over a database. A DBMS can Upgrade your data processes and increase the business value of your association's data means, freeing users across the Organization from repetitious and time-consuming data processing tasks. The result? A more productive pool, better compliance with data regulations, and better opinions.       

As an illustration, manufacturing companies create and sell products every day. DBMS is used to maintain records of all these transactions. Just like the railway reservations, In Airline Reservation systems, DBMS is required to keep records of flight arrival, departure, and delay status.

 

Here is a list of common database management systems:

  1. Relational databases.
  2. Network databases.
  3. Object-oriented databases.
  4. Graph databases.
  5. ER model databases.
  6. Document databases.
  7. NoSQL databases.
  8. Hierarchical databases.

 

2. Data Warehouse: 

Structured, filtered data that has previously been processed for a Specific Purpose is stored in a data warehouse.

By storing just processed data, data warehouses conserve expensive storage space by not keeping the data that may never be needed. Additionally, a bigger audience can easily analyze processed data.

 

 

3. Data Lake: 

A Data Lake is a large collection of unstructured data with an ambiguous use. Data warehouses hold processed and refined data, whereas data lakes typically store raw data.

Data lakes need a lot more storage space than data warehouses do.

Additionally, unprocessed raw data is pliable, quick to analyze for any objective, and excellent for machine learning.

However, the drawback of raw data is that, in the absence of adequate data governance and data quality standards, data lakes can occasionally turn into data swamps.

There is an emerging trend of architecture management for data Lakes where the management skills of a data warehouse and the flexibility of a data lake are combined.

Let’s know the differences between a Data Warehouse & Data Lake: 

  • A lake is a central repository that enables you to store data from all sources and in any formats at any size, whereas data warehouses store structured data.
  • Although both data lakes and data warehouses are frequently used to store massive data, the words are not equivalent.
  • Data processing before being added to the data warehouse, they were arranged into a single schema.
  • Raw and unstructured data, however, is stored in data lakes.

In the warehouse, the data is cleaned before analysis, but in Lakes, the data is chosen and organized as needed.

  • The individual data elements in a data lake do not all have the same purpose. The data lake receives raw data with a specific purpose in mind. This suggests that filtering and organization are less strict in Data Lakes.
  • In comparison to a data warehouse, a data lake offers more storage possibilities, is more complicated, and has various use cases.
  • Raw data that has been modified for a particular application is known as processed data. All of the data has been utilized inside the Organization for a specific purpose since data warehouses only store processed data. This indicates that storage space is well optimized and not squandered on Data that will never be used.
  • Data Lakes are often difficult to navigate by those Unfamiliar with unprocessed data. To comprehend and transform raw, unstructured data into a specific commercial purpose or case study, you often need a data scientist and specialized tools.
  • However, there is an emerging trend behind data preparation tools that create self-service access to the Information stored in data lakes.

 

4.Data Mart: 

A data mart is a curated subset of data usually created for analytics and Business Intelligence users.

They are often generated as a repository of pertinent information for a Subgroup of workers or a particular use case.

Data Marts may also be less expensive for storage and faster for analysis given their smaller and specialized architecture.

 

 

Let’s know the differences between a Data Warehouse & Data Mart: 

  • A data mart is limited to a Single focus for one line of business.
  • A data warehouse often covers multiple areas and is enterprise wide.
  • Data mart saves data from just a few sources whereas data warehouse stores data from several sources.
  • A data mart is typically less than 100 GB whereas Data warehouse is typically larger than 100 GB and often a terabyte or more.
  • Slow and overloaded Data warehouses are often the Underlying reason for creating data marts and data warehouses serve as their underlying data source.
  • Often when the data volumes and analytics use case expand, Organizations cannot provide all analytics use case without decreasing the performance of their data warehouse, thus they export a subset of data to mart for analytics.                                                                                                                                                                                                                                                                                                                                                                                                    Snowflake: Eliminate the need for Data marts :  Snowflake's cutting-edge Cloud data architecture, which is highly elastic, guarantees that it can accommodate an infinite amount of data and Users. Additional compute resources can be spun up quickly to address new use cases without affecting other operations that is happening on the databases thus eliminating the need to spin off separate physical data marts to maintain acceptable performance of the databases.

 

 

Environmental Impact of Data Storage 

2.5% to 3.7% of all greenhouse gas emissions come from data centers.

The emissions from data centers surpass those from the airline industry (2.4%) and other major economic drivers.

Data storage has a variety of environmental effects, including:

1. Carbon Emissions: 0.3% of total CO2 emissions are attributable to data storage. These emissions are caused by energy use and business operations.

2. E-Wastes: Data Storage generates a sizable amount of electronic trash (E-trash). Toxic electronic waste exists. In addition to not biodegrading, it also builds up in the ecosystem and degrades the soil and air quality of a region.

3. Battery Backups: In the event of a power outage, data centers employ batteries as a backup. Since they include poisonous, corrosive, and dangerous compounds like lead, lithium, mercury, and cadmium, after they are disposed of, these batteries wind up in landfills and start to have an influence on the environment.

4. Coolant: Coolants are necessary for Computer Room Air Conditioning (CRAC) in Data Centers situated in locations where free cooling or indirect evaporative coolers are prohibited. Although coolants can be used for liquid cooling, chemicals are needed. Chlorofluorocarbons (CFCs), halocarbons, or Freon are frequently used as coolants. These substances range in toxicity from low to high, and prolonged exposure to them can lead to ozone depletion. Since they trap heat in the atmosphere, they also have the potential to contribute to global warming.

5. Cleaning supplies: Dust and filth must be removed for data centers to operate effectively. Utilizing specialized cleaning solutions is the greatest approach to get rid of dust and filth, which are enemies of computer equipment. Since they include bleach, ammonia, and chlorine, most specialized cleaning solutions are harmful. These chemical substances have an impact on human, marine, and natural life. They are linked to ozone loss in the atmosphere, which raises the quantity of ultraviolet (UV) light that reaches the earth's surface.

6. Electronic Waste: Servers require replacing every three to five years due to the limited lifespan of computing equipment. In addition to replacements, there are damaged hard drives, loose bearings, and shattered monitors.

 

 

How to reduce Data Center Carbon Footprints

Climate is a particularly contemporary concern for data centers.

According to government estimates, a typical commercial structure uses 10 to 50 times as much energy per square foot as a data center. The shaky figures on water use that are not always published further confound these calculations.

 

 

A data center that makes use of energy-efficient technology is considered carbon-neutral. neutral for carbon Data centers play a key role in the IT industry's quest towards sustainability.

  • Consume less energy.
  • Reduces data spending.
  • Reduces the environmental Impact of data centers.
  • Hyperscale data Centers are Significantly more efficient than Internal data Centers.

 

 

The major areas of Improvement to reduce data center Carbon footprints.

1. Remote Management and Truck Rolls: Truck rolls are the traditional method of troubleshooting issues at data centers, when a technician would need to visit the location to look at the issue. The Technician would have to be flown there, and the procedure is projected to cost hundreds of dollars every visit and have a terrible environmental effect.

Since there are an astonishing number of inspections with no problem detected, this influence is frequently made without any justification.

Because of this, remote management capabilities are one of the essential elements in lowering the carbon footprint of data centers. Network engineers may access the data center software from any distant place without the need to fly in a specialist. 

By enabling network experts to take care of data center problems remotely, they can: 

  • The requirement to physically transport technicians to the center.
  • An environmental, financial, and time-consuming burden

 

2. Data Center consolidation Strategies: 

  • Invest in new machinery – Means energy efficient and better functioning equipment.
  • Spend money on renewable energy – Wind turbines and Solar panels are low maintenance and cost  effective, reduced CO2 emission, nuclear power is also an effective energy solution.
  • Spend money on cooling methods – Minimize environmental impact by using free cooling techniques like using outside water and air to cool the water and air in cold aisle corridors.
  • Turn off Inactive Servers – Turning Servers off during off – peak hours during traffic slowdown. 

This saves about 10-15% of energy reducing CO2 emissions.             

  • With Liquid Immersion Cooling Data centers can cut 90% of their cooling needs. Not only it prevents tons of CO2 emissions but also cost effective. 
  • Mitigating Server Inefficiencies - There is high strain on Servers to ensure data availability.

Many data centers have already taken major steps to reduce this inefficiency. By identifying “Zombie Servers” and adopting Server Virtualization.

 

3. DCIM Management Tools: Data Center Infrastructure Management (DCIM) can help data centers improve energy efficiency by – 

  • Examining data center architecture 
  • The system management feature
  • Asset locating
  • Energy Administration
  • Capacity Arrangement

 

4. Request a Green Certification: One such Globally Recognized Certification for Green Buildings is Leadership in Energy and Environmental Design (LEED). Additionally, it provides advice on eco-friendly integration technologies and sustainable building practices. To lessen their carbon footprint, data centers should aim to become green certified.

 

5. Utilize Effective Water-Cooling Systems: Water Usage Efficiency (WUE) factor.

The Formula to Calculate WUE

    WUE = Total Water Used by the Facility/ Energy consumed Solely by the IT Equipment

Higher the WUE, more water Intensive the data Center is

 

6. Improve Carbon Usage Effectiveness (CUE)

     The Formula to Calculate CUE

    CUE = CO2 Emissions Caused by Total Data Center Energy/ IT Equipment Energy

 

7. Reduce Power Usage Efficiency (PUE).

     Ideal PUE Value is 1.0 Indicates that all energy consumed by Data Center is used to power actual   computing devices.

 The best data Center in the World achieved a PUE = 1.2

 

  • Moumita Majumdar
  • Oct, 01 2023

Add New Comments

Please login in order to make a comment.

Recent Comments

Be the first to start engaging with the bis blog.