Posts

Showing posts from June, 2022

DATA CLEANING WITH SQL (pt 3)

 Checking for NULL Values. it's important we check for NULL values across all tables and columns in our data in other to ensure the validity of our data. We can check for Null values using the IS NULL tool. In the video below, we check for the null values in the Property Address field using the query below: SELECT PropertyAddress FROM PROJ.dbo.Nash WHERE PropertyAddress IS NULL This returns a result that shows that we have 29 rows in the Property Address Field with Null values

DATA CLEANING WITH SQL (pt4)

 We have seen how to change data types using CAST and check for duplicates and Null values. But what do we do when we find some fields with NULL values and what do clean it out? One of the ways is by using COALESCE. Coalesce returns a NULL field with a NON-NULL value. The format for using COALESCE is COALESCE(the_column_with_null,replacement) In the video below, there are 20 rows with NULL values in the Property Address field and we would like to replace the NULL values with the Owner Address. The Query we use is: COALESCE(PropertyAddress, OwnerAddress) AS address. This returns a column with the Null values in the PropertyAddress replaced with the OwnerAddress

DATA CLEANING WITH SQP (pt2)

Checking for Duplicates It is very important we check for duplicates in our Data as part of Confirming the integrity of our data. There are some data fields that are not meant to have duplicates, for example, the UNIQUE ID column in the video below. We need to confirm that there are no duplicate values in the UNIQUE ID Column. We can do that using the query below: Select UniqueID, COUNT (UniqueID) FROM PROJ.dbo.Nash GROUP BY UniqueID HAVING COUNT (UniqueID) >1 At the end of the Query, we can see that no duplicate Values exist in the UNIQUE ID Feild

DATA CLEANING WITH SQL

 Data Cleaning is one of the most important skills to have as a Data Analyst/Scientist. There are several tools used to clean up data one of which is SQL. SQL is one of the most widely used tools in Data Cleaning because of its convenience in working with a large amount of data. You could be working with a dataset with over 50,000 rows in SQL. When it comes to Data Cleaning in SQL, there are different tools. We will be looking at CAST() Casting helps in changing a data field data type to a more convenient data type to aid our analysis. CAST works in the format below: CAST(input_data AS data-type) In the data set shown in the video below, the data type of the date field( SaleDate) is Date-time instead of Date. we can change it to date by using the Query below: SELECT CAST(SaleDate AS Date) AS Date FROM PROJ.dbo.Nash.
  BUILDING AN EXCEL DASHBOARD      DESCRIPTION This is a Descriptive Analysis of the Data on the World's population from 2010 to 2020.   The main information provided are Various Metrics (Series by which data is classified), Country, and Region. More on the Series is available below The Dashboard visualizes the data of the World Population by metrics such as: 1. Infant Mortality for both sexes per (1000 Live births). 2. Life Expectancy for both sexes(years) 3. Life Expectancy at birth female (years) 4. Life Expectancy at birth for males (years) 5. Maternal Mortality ratio (deaths per 100,000 population) 6. Population annual rate of increase(percent) 7. Total fertility rate (children per woman ) It also displays a time series from 2010 to 2022 at a five-year interval. The various charts on the dashboard were created to interact with each other by the use of the slicers DATA  The data was sourced from data.un.org.  Columns with information not usefu...