This is the study of Washington Post Guild members' salaries based on data turned over by management of The Washington Post on May 28, 2021, pursuant to a request by members of the Guild. Management turned over two Excel files: one file detailing the salaries of current guild members working for The Post (as of the date of transmission) and one file detailing the salaries of past guild members who worked for The Post and have left the organization in the past six years.
What follows is an attempt to understand pay at The Washington Post. No individual analysis should be taken on its own to mean that disparities in pay do or do not exist. This study will start with summary analysis of trends and will dive deeper as the study goes on.
The only data manipulation done prior to analysis was taking the data out of Excel and putting the files into CSV files, converting dates from 'MM/DD/YYYY' to 'YYYY-MM-DD' and removing commas from monetary columns where values exceeded 1,000.
from pathlib import Path
import re
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.iolib.summary2 import summary_col
from linearmodels.iv import IV2SLS
import seaborn as sns
pd.options.display.max_columns = None
pd.set_option('display.float_format', lambda x: '%.2f' % x)
BASEDIR = Path.cwd()
CSVPATH = BASEDIR.joinpath('csvs')
active_wd_schema = {
'department': str,
'employee_id': str,
'gender': str,
'race_ethnicity': str,
'education': str,
'military_status': str,
'date_of_birth': str,
'original_hire_date': str,
'hire_date': str,
'pay_rate_type': str,
'current_base_pay': np.float64,
'job_profile_current': str,
'time_type_current': str,
'cost_center_current': str,
'effective_date1': str,
'business_process_type1': str,
'business_process_reason1': str,
'pay_rate_type1': str,
'base_pay_change1': np.float64,
'job_profile1': str,
'time_type1': str,
'cost_center1': str,
'effective_date2': str,
'business_process_type2': str,
'business_process_reason2': str,
'pay_rate_type2': str,
'base_pay_change2': np.float64,
'job_profile2': str,
'time_type2': str,
'cost_center2': str,
'effective_date3': str,
'business_process_type3': str,
'business_process_reason3': str,
'pay_rate_type3': str,
'base_pay_change3': np.float64,
'job_profile3': str,
'time_type3': str,
'cost_center3': str,
'effective_date4': str,
'business_process_type4': str,
'business_process_reason4': str,
'pay_rate_type4': str,
'base_pay_change4': np.float64,
'job_profile4': str,
'time_type4': str,
'cost_center4': str,
'effective_date5': str,
'business_process_type5': str,
'business_process_reason5': str,
'pay_rate_type5': str,
'base_pay_change5': np.float64,
'job_profile5': str,
'time_type5': str,
'cost_center5': str,
'effective_date6': str,
'business_process_type6': str,
'business_process_reason6': str,
'pay_rate_type6': str,
'base_pay_change6': np.float64,
'job_profile6': str,
'time_type6': str,
'cost_center6': str,
'effective_date7': str,
'business_process_type7': str,
'business_process_reason7': str,
'pay_rate_type7': str,
'base_pay_change7': np.float64,
'job_profile7': str,
'time_type7': str,
'cost_center7': str,
'effective_date8': str,
'business_process_type8': str,
'business_process_reason8': str,
'pay_rate_type8': str,
'base_pay_change8': np.float64,
'job_profile8': str,
'time_type8': str,
'cost_center8': str,
'effective_date9': str,
'business_process_type9': str,
'business_process_reason9': str,
'pay_rate_type9': str,
'base_pay_change9': np.float64,
'job_profile9': str,
'time_type9': str,
'cost_center9': str,
'effective_date10': str,
'business_process_type10': str,
'business_process_reason10': str,
'pay_rate_type10': str,
'base_pay_change10': np.float64,
'job_profile10': str,
'time_type10': str,
'cost_center10': str,
'effective_date11': str,
'business_process_type11': str,
'business_process_reason11': str,
'pay_rate_type11': str,
'base_pay_change11': np.float64,
'job_profile11': str,
'time_type11': str,
'cost_center11': str,
'effective_date12': str,
'business_process_type12': str,
'business_process_reason12': str,
'pay_rate_type12': str,
'base_pay_change12': np.float64,
'job_profile12': str,
'time_type12': str,
'cost_center12': str,
'effective_date13': str,
'business_process_type13': str,
'business_process_reason13': str,
'pay_rate_type13': str,
'base_pay_change13': np.float64,
'job_profile13': str,
'time_type13': str,
'cost_center13': str,
'effective_date14': str,
'business_process_type14': str,
'business_process_reason14': str,
'pay_rate_type14': str,
'base_pay_change14': np.float64,
'job_profile14': str,
'time_type14': str,
'cost_center14': str,
'effective_date15': str,
'business_process_type15': str,
'business_process_reason15': str,
'pay_rate_type15': str,
'base_pay_change15': np.float64,
'job_profile15': str,
'time_type15': str,
'cost_center15': str,
'effective_date16': str,
'business_process_type16': str,
'business_process_reason16': str,
'pay_rate_type16': str,
'base_pay_change16': np.float64,
'job_profile16': str,
'time_type16': str,
'cost_center16': str,
'effective_date17': str,
'business_process_type17': str,
'business_process_reason17': str,
'pay_rate_type17': str,
'base_pay_change17': np.float64,
'job_profile17': str,
'time_type17': str,
'cost_center17': str,
'effective_date18': str,
'business_process_type18': str,
'business_process_reason18': str,
'pay_rate_type18': str,
'base_pay_change18': np.float64,
'job_profile18': str,
'time_type18': str,
'cost_center18': str,
'effective_date19': str,
'business_process_type19': str,
'business_process_reason19': str,
'pay_rate_type19': str,
'base_pay_change19': np.float64,
'job_profile19': str,
'time_type19': str,
'cost_center19': str,
'effective_date20': str,
'business_process_type20': str,
'business_process_reason20': str,
'pay_rate_type20': str,
'base_pay_change20': np.float64,
'job_profile20': str,
'time_type20': str,
'cost_center20': str,
'effective_date21': str,
'business_process_type21': str,
'business_process_reason21': str,
'pay_rate_type21': str,
'base_pay_change21': np.float64,
'job_profile21': str,
'time_type21': str,
'cost_center21': str,
'effective_date22': str,
'business_process_type22': str,
'business_process_reason22': str,
'pay_rate_type22': str,
'base_pay_change22': np.float64,
'job_profile22': str,
'time_type22': str,
'cost_center22': str,
'effective_date23': str,
'business_process_type23': str,
'business_process_reason23': str,
'pay_rate_type23': str,
'base_pay_change23': np.float64,
'job_profile23': str,
'time_type23': str,
'cost_center23': str,
'effective_date24': str,
'business_process_type24': str,
'business_process_reason24': str,
'pay_rate_type24': str,
'base_pay_change24': np.float64,
'job_profile24': str,
'time_type24': str,
'cost_center24': str,
'effective_date25': str,
'business_process_type25': str,
'business_process_reason25': str,
'job_profile25': str,
'cost_center25': str,
'effective_date26': str,
'business_process_type26': str,
'business_process_reason26': str,
'job_profile26': str,
'cost_center26': str,
'2008_annual_performance_rating': np.float64,
'2009_annual_performance_rating': np.float64,
'2010_annual_performance_rating': np.float64,
'2011_annual_performance_rating': np.float64,
'2012_annual_performance_rating': np.float64,
'2013_annual_performance_rating': np.float64,
'2014_annual_performance_rating': np.float64,
'2015_annual_performance_rating': np.float64,
'2016_annual_performance_rating': np.float64,
'2017_annual_performance_rating': np.float64,
'2018_annual_performance_rating': np.float64,
'2019_annual_performance_rating': np.float64,
'2020_annual_performance_rating': np.float64
}
parse_dates = ['date_of_birth', 'original_hire_date', 'hire_date','effective_date1','effective_date2','effective_date3','effective_date4','effective_date5','effective_date6','effective_date7','effective_date8','effective_date9','effective_date10','effective_date11','effective_date12','effective_date13','effective_date14','effective_date15','effective_date16','effective_date17','effective_date18','effective_date19','effective_date20','effective_date21','effective_date22','effective_date23','effective_date24','effective_date25','effective_date26']
terminated_wd_schema = {
'department': str,
'employee_id': str,
'gender': str,
'race_ethnicity': str,
'education': str,
'military_status': str,
'date_of_birth': str,
'original_hire_date': str,
'hire_date': str,
'termination_date': str,
'pay_rate_type': str,
'current_base_pay': np.float64,
'job_profile_current': str,
'time_type_current': str,
'cost_center_current': str,
'effective_date1': str,
'business_process_type1': str,
'business_process_reason1': str,
'pay_rate_type1': str,
'base_pay_change1': np.float64,
'job_profile1': str,
'time_type1': str,
'cost_center1': str,
'effective_date2': str,
'business_process_type2': str,
'business_process_reason2': str,
'pay_rate_type2': str,
'base_pay_change2': np.float64,
'job_profile2': str,
'time_type2': str,
'cost_center2': str,
'effective_date3': str,
'business_process_type3': str,
'business_process_reason3': str,
'pay_rate_type3': str,
'base_pay_change3': np.float64,
'job_profile3': str,
'time_type3': str,
'cost_center3': str,
'effective_date4': str,
'business_process_type4': str,
'business_process_reason4': str,
'pay_rate_type4': str,
'base_pay_change4': np.float64,
'job_profile4': str,
'time_type4': str,
'cost_center4': str,
'effective_date5': str,
'business_process_type5': str,
'business_process_reason5': str,
'pay_rate_type5': str,
'base_pay_change5': np.float64,
'job_profile5': str,
'time_type5': str,
'cost_center5': str,
'effective_date6': str,
'business_process_type6': str,
'business_process_reason6': str,
'pay_rate_type6': str,
'base_pay_change6': np.float64,
'job_profile6': str,
'time_type6': str,
'cost_center6': str,
'effective_date7': str,
'business_process_type7': str,
'business_process_reason7': str,
'pay_rate_type7': str,
'base_pay_change7': np.float64,
'job_profile7': str,
'time_type7': str,
'cost_center7': str,
'effective_date8': str,
'business_process_type8': str,
'business_process_reason8': str,
'pay_rate_type8': str,
'base_pay_change8': np.float64,
'job_profile8': str,
'time_type8': str,
'cost_center8': str,
'effective_date9': str,
'business_process_type9': str,
'business_process_reason9': str,
'pay_rate_type9': str,
'base_pay_change9': np.float64,
'job_profile9': str,
'time_type9': str,
'cost_center9': str,
'effective_date10': str,
'business_process_type10': str,
'business_process_reason10': str,
'pay_rate_type10': str,
'base_pay_change10': np.float64,
'job_profile10': str,
'time_type10': str,
'cost_center10': str,
'effective_date11': str,
'business_process_type11': str,
'business_process_reason11': str,
'pay_rate_type11': str,
'base_pay_change11': np.float64,
'job_profile11': str,
'time_type11': str,
'cost_center11': str,
'effective_date12': str,
'business_process_type12': str,
'business_process_reason12': str,
'pay_rate_type12': str,
'base_pay_change12': np.float64,
'job_profile12': str,
'time_type12': str,
'cost_center12': str,
'effective_date13': str,
'business_process_type13': str,
'business_process_reason13': str,
'pay_rate_type13': str,
'base_pay_change13': np.float64,
'job_profile13': str,
'time_type13': str,
'cost_center13': str,
'effective_date14': str,
'business_process_type14': str,
'business_process_reason14': str,
'pay_rate_type14': str,
'base_pay_change14': np.float64,
'job_profile14': str,
'time_type14': str,
'cost_center14': str,
'effective_date15': str,
'business_process_type15': str,
'business_process_reason15': str,
'pay_rate_type15': str,
'base_pay_change15': np.float64,
'job_profile15': str,
'time_type15': str,
'cost_center15': str,
'2008_annual_performance_rating': np.float64,
'2009_annual_performance_rating': np.float64,
'2010_annual_performance_rating': np.float64,
'2011_annual_performance_rating': np.float64,
'2012_annual_performance_rating': np.float64,
'2013_annual_performance_rating': np.float64,
'2014_annual_performance_rating': np.float64,
'2015_annual_performance_rating': np.float64,
'2016_annual_performance_rating': np.float64,
'2017_annual_performance_rating': np.float64,
'2018_annual_performance_rating': np.float64,
'2019_annual_performance_rating': np.float64,
'2020_annual_performance_rating': np.float64
}
parse_dates2 = ['date_of_birth', 'original_hire_date', 'hire_date','termination_date','effective_date1','effective_date2','effective_date3','effective_date4','effective_date5','effective_date6','effective_date7','effective_date8','effective_date9','effective_date10','effective_date11','effective_date12','effective_date13','effective_date14','effective_date15']
df = pd.read_csv(CSVPATH.joinpath('active_wd_2021.csv'), dtype=active_wd_schema, parse_dates=parse_dates)
df2 = pd.read_csv(CSVPATH.joinpath('terminated_wd_2021.csv'), dtype=terminated_wd_schema, parse_dates=parse_dates2)
date_received = np.datetime64('2021-05-28')
df['age'] = (date_received - df['date_of_birth']).astype('<m8[Y]')
df['years_of_service'] = (date_received - df['hire_date']).astype('<m8[Y]')
df2['age'] = (date_received - df2['date_of_birth']).astype('<m8[Y]')
df2['years_of_service'] = (date_received - df2['hire_date']).astype('<m8[Y]')
bins= [0,25,30,35,40,45,50,55,60,65,100]
labels = ['<25','25-29','30-34','35-39','40-44', '45-49','50-54','55-59','60-64','65+']
df['age_group_5'] = pd.cut(df['age'], bins=bins, labels=labels, right=False)
df2['age_group_5'] = pd.cut(df2['age'], bins=bins, labels=labels, right=False)
bins= [0,25,35,45,55,65,100]
labels = ['<25','25-34','35-44','45-54','55-64','65+']
df['age_group_10'] = pd.cut(df['age'], bins=bins, labels=labels, right=False)
df2['age_group_10'] = pd.cut(df2['age'], bins=bins, labels=labels, right=False)
bins= [0,1,3,6,11,16,21,26,100]
labels = ['0','1-2','3-5','6-10','11-15','16-20','21-25','25+']
df['years_of_service_grouped'] = pd.cut(df['years_of_service'], bins=bins, labels=labels, right=False)
df2['years_of_service_grouped'] = pd.cut(df2['years_of_service'], bins=bins, labels=labels, right=False)
def dept(row):
NEWS_DEPTS = ['News', 'Editorial', 'News Service and Syndicate']
COMMERCIAL_DEPTS = [
'Client Solutions', 'Circulation', 'Finance', 'Marketing', 'WP News Media Services', 'Production', 'Public Relations', 'Administration', 'Product', 'Audience Development and Insights', 'Customer Care and Logistics', 'Legal', 'Washington Post Live'
]
if row['department'] in NEWS_DEPTS:
return 'News'
elif row['department'] in COMMERCIAL_DEPTS:
return 'Commercial'
else:
return 'Unknown'
df['dept'] = df.apply(lambda row: dept(row), axis=1)
df2['dept'] = df2.apply(lambda row: dept(row), axis=1)
def desk(row):
OPERATIONS = ['110000 News Operations','110001 News Digital Operations']
AUDIENCE = ['110610 Audience Development and Engagement']
AUDIO = ['110620 News Audio']
DESIGN = ['110604 Presentation Design']
PHOTO = ['110605 Presentation']
EMERGING = ['110664 News National Apps','110665 News The Lily','110666 News Snapchat','110667 News By The Way']
FINANCIAL = ['113210 Economy and Business']
FOREIGN = ['114000 Foreign Administration','114095 News Foreign Brazil','114100 Foreign Latam','114220 News Foreign Istanbul','114235 Foreign Western Europe','114300 News Foreign West Africa','114415 Foreign Hong Kong','114405 Foreign Beijing Bureau','114105 Foreign Mexico Bureau','114005 Foreign Beirut Bureau','114400 Foreign India Bureau','114410 Foreign Tokyo Bureau','114205 Foreign Islamabad Bureau','114305 Foreign Nairobi Bureau','114240 Foreign Rome Bureau','114200 Foreign London Bureau','114230 Foreign Moscow Bureau','114225 Foreign Cairo Bureau','114215 Foreign Berlin Bureau','114310 Foreign Baghdad Bureau','114315 Foreign Jerusalem Bureau']
GRAPHICS = ['110603 Presentation Graphics']
INVESTIGATIVE = ['110450 Investigative']
LOCAL = ['112300 Local Politics and Government']
MULTI = ['110601 Multiplatform Desk']
NATIONAL = ['110500 Magazine','113200 National Politics and Government','113205 National Security','113215 News National Health & Science','113220 National Enterprise','113235 National America','113240 News National Environment']
RESEARCH = ['110006 News Content & Research']
LOGISTICS = ['110455 News Logistics']
OUTLOOK = ['110410 Book World','110460 Outlook']
POLLING = ['110475 Polling']
SPORTS = ['110015 Sports Main']
STYLE = ['110300 Style','110435 Food','110485 Travel','110495 Local Living','110505 Weekend']
UNIVERSAL = ['110600 Universal Desk']
VIDEO = ['110652 News Video - General']
OTHER = ['110663 Wake Up Report']
EDITORIAL = ['115000 Editorial Administration']
if row['cost_center_current'] in OPERATIONS:
return 'Operations'
elif row['cost_center_current'] in AUDIENCE:
return 'Audience Development and Engagement'
elif row['cost_center_current'] in AUDIO:
return 'Audio'
elif row['cost_center_current'] in DESIGN:
return 'Design'
elif row['cost_center_current'] in EMERGING:
return 'Emerging News Products'
elif row['cost_center_current'] in FINANCIAL:
return 'Financial'
elif row['cost_center_current'] in FOREIGN:
return 'Foreign'
elif row['cost_center_current'] in GRAPHICS:
return 'Graphics'
elif row['cost_center_current'] in LOCAL:
return 'Local'
elif row['cost_center_current'] in MULTI:
return 'Multiplatform'
elif row['cost_center_current'] in NATIONAL:
return 'National'
elif row['cost_center_current'] in RESEARCH:
return 'News Content and Research'
elif row['cost_center_current'] in LOGISTICS:
return 'News Logistics'
elif row['cost_center_current'] in OUTLOOK:
return 'Outlook'
elif row['cost_center_current'] in POLLING:
return 'Polling'
elif row['cost_center_current'] in PHOTO:
return 'Photography'
elif row['cost_center_current'] in SPORTS:
return 'Sports'
elif row['cost_center_current'] in STYLE:
return 'Style'
elif row['cost_center_current'] in UNIVERSAL:
return 'Universal Desk'
elif row['cost_center_current'] in VIDEO:
return 'Video'
elif row['cost_center_current'] in OTHER:
return 'Other'
elif row['cost_center_current'] in EDITORIAL:
return 'Editorial'
else:
return 'non-newsroom'
df['desk'] = df.apply(lambda row: desk(row), axis=1)
df2['desk'] = df2.apply(lambda row: desk(row), axis=1)
def tier(row):
TIER1 = ['National','Foreign','Financial','Investigative']
TIER2 = ['Style','Local','Graphics','Universal Desk','Sports','Outlook','Editorial']
TIER3 = ['Audio','Polling','Design','Operations','Multiplatform','Video','Audience Development and Engagement','Photography']
TIER4 = ['News Logistics','News Content and Research','Emerging News Products','Other']
if row['desk'] in TIER1:
return 'Tier 1'
elif row['desk'] in TIER2:
return 'Tier 2'
elif row['desk'] in TIER3:
return 'Tier 3'
elif row['desk'] in TIER4:
return 'Tier 4'
else:
return 'other'
df['tier'] = df.apply(lambda row: tier(row), axis=1)
df2['tier'] = df2.apply(lambda row: tier(row), axis=1)
def race_groups(row):
WHITE = ['White (United States of America)']
NONWHITE = [
'Black or African American (United States of America)', 'Asian (United States of America)', 'Hispanic or Latino (United States of America)', 'Two or More Races (United States of America)', 'American Indian or Alaska Native (United States of America)', 'Native Hawaiian or Other Pacific Islander (United States of America)'
]
if row['race_ethnicity'] in WHITE:
return 'white'
elif row['race_ethnicity'] in NONWHITE:
return 'person of color'
else:
return 'unknown'
df['race_grouping'] = df.apply(lambda row: race_groups(row), axis=1)
df2['race_grouping'] = df2.apply(lambda row: race_groups(row), axis=1)
reason_for_change1 = df[['business_process_reason1','base_pay_change1','effective_date1','pay_rate_type1','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason1':'business_process_reason','base_pay_change1':'base_pay_change','effective_date1':'effective_date','pay_rate_type1':'pay_rate_type'})
reason_for_change2 = df[['business_process_reason2','base_pay_change2','effective_date2','pay_rate_type2','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason2':'business_process_reason','base_pay_change2':'base_pay_change','effective_date2':'effective_date','pay_rate_type2':'pay_rate_type'})
reason_for_change3 = df[['business_process_reason3','base_pay_change3','effective_date3','pay_rate_type3','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason3':'business_process_reason','base_pay_change3':'base_pay_change','effective_date3':'effective_date','pay_rate_type3':'pay_rate_type'})
reason_for_change4 = df[['business_process_reason4','base_pay_change4','effective_date4','pay_rate_type4','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason4':'business_process_reason','base_pay_change4':'base_pay_change','effective_date4':'effective_date','pay_rate_type4':'pay_rate_type'})
reason_for_change5 = df[['business_process_reason5','base_pay_change5','effective_date5','pay_rate_type5','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason5':'business_process_reason','base_pay_change5':'base_pay_change','effective_date5':'effective_date','pay_rate_type5':'pay_rate_type'})
reason_for_change6 = df[['business_process_reason6','base_pay_change6','effective_date6','pay_rate_type6','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason6':'business_process_reason','base_pay_change6':'base_pay_change','effective_date6':'effective_date','pay_rate_type6':'pay_rate_type'})
reason_for_change7 = df[['business_process_reason7','base_pay_change7','effective_date7','pay_rate_type7','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason7':'business_process_reason','base_pay_change7':'base_pay_change','effective_date7':'effective_date','pay_rate_type7':'pay_rate_type'})
reason_for_change8 = df[['business_process_reason8','base_pay_change8','effective_date8','pay_rate_type8','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason8':'business_process_reason','base_pay_change8':'base_pay_change','effective_date8':'effective_date','pay_rate_type8':'pay_rate_type'})
reason_for_change9 = df[['business_process_reason9','base_pay_change9','effective_date9','pay_rate_type9','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason9':'business_process_reason','base_pay_change9':'base_pay_change','effective_date9':'effective_date','pay_rate_type9':'pay_rate_type'})
reason_for_change10 = df[['business_process_reason10','base_pay_change10','effective_date10','pay_rate_type10','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason10':'business_process_reason','base_pay_change10':'base_pay_change','effective_date10':'effective_date','pay_rate_type10':'pay_rate_type'})
reason_for_change11 = df[['business_process_reason11','base_pay_change11','effective_date11','pay_rate_type11','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason11':'business_process_reason','base_pay_change11':'base_pay_change','effective_date11':'effective_date','pay_rate_type11':'pay_rate_type'})
reason_for_change12 = df[['business_process_reason12','base_pay_change12','effective_date12','pay_rate_type12','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason12':'business_process_reason','base_pay_change12':'base_pay_change','effective_date12':'effective_date','pay_rate_type12':'pay_rate_type'})
reason_for_change13 = df[['business_process_reason13','base_pay_change13','effective_date13','pay_rate_type13','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason13':'business_process_reason','base_pay_change13':'base_pay_change','effective_date13':'effective_date','pay_rate_type13':'pay_rate_type'})
reason_for_change14 = df[['business_process_reason14','base_pay_change14','effective_date14','pay_rate_type14','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason14':'business_process_reason','base_pay_change14':'base_pay_change','effective_date14':'effective_date','pay_rate_type14':'pay_rate_type'})
reason_for_change15 = df[['business_process_reason15','base_pay_change15','effective_date15','pay_rate_type15','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason15':'business_process_reason','base_pay_change15':'base_pay_change','effective_date15':'effective_date','pay_rate_type15':'pay_rate_type'})
reason_for_change16 = df[['business_process_reason16','base_pay_change16','effective_date16','pay_rate_type16','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason16':'business_process_reason','base_pay_change16':'base_pay_change','effective_date16':'effective_date','pay_rate_type16':'pay_rate_type'})
reason_for_change17 = df[['business_process_reason17','base_pay_change17','effective_date17','pay_rate_type17','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason17':'business_process_reason','base_pay_change17':'base_pay_change','effective_date17':'effective_date','pay_rate_type17':'pay_rate_type'})
reason_for_change18 = df[['business_process_reason18','base_pay_change18','effective_date18','pay_rate_type18','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason18':'business_process_reason','base_pay_change18':'base_pay_change','effective_date18':'effective_date','pay_rate_type18':'pay_rate_type'})
reason_for_change19 = df[['business_process_reason19','base_pay_change19','effective_date19','pay_rate_type19','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason19':'business_process_reason','base_pay_change19':'base_pay_change','effective_date19':'effective_date','pay_rate_type19':'pay_rate_type'})
reason_for_change20 = df[['business_process_reason20','base_pay_change20','effective_date20','pay_rate_type20','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason20':'business_process_reason','base_pay_change20':'base_pay_change','effective_date20':'effective_date','pay_rate_type20':'pay_rate_type'})
reason_for_change21 = df[['business_process_reason21','base_pay_change21','effective_date21','pay_rate_type21','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason21':'business_process_reason','base_pay_change21':'base_pay_change','effective_date21':'effective_date','pay_rate_type21':'pay_rate_type'})
reason_for_change22 = df[['business_process_reason22','base_pay_change22','effective_date22','pay_rate_type22','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason22':'business_process_reason','base_pay_change22':'base_pay_change','effective_date22':'effective_date','pay_rate_type22':'pay_rate_type'})
reason_for_change23 = df[['business_process_reason23','base_pay_change23','effective_date23','pay_rate_type23','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason23':'business_process_reason','base_pay_change23':'base_pay_change','effective_date23':'effective_date','pay_rate_type23':'pay_rate_type'})
reason_for_change24 = df[['business_process_reason24','base_pay_change24','effective_date24','pay_rate_type24','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason24':'business_process_reason','base_pay_change24':'base_pay_change','effective_date24':'effective_date','pay_rate_type24':'pay_rate_type'})
reason_for_change25 = df2[['business_process_reason1','base_pay_change1','effective_date1','pay_rate_type1','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason1':'business_process_reason','base_pay_change1':'base_pay_change','effective_date1':'effective_date','pay_rate_type1':'pay_rate_type'})
reason_for_change26 = df2[['business_process_reason2','base_pay_change2','effective_date2','pay_rate_type2','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason2':'business_process_reason','base_pay_change2':'base_pay_change','effective_date2':'effective_date','pay_rate_type2':'pay_rate_type'})
reason_for_change27 = df2[['business_process_reason3','base_pay_change3','effective_date3','pay_rate_type3','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason3':'business_process_reason','base_pay_change3':'base_pay_change','effective_date3':'effective_date','pay_rate_type3':'pay_rate_type'})
reason_for_change28 = df2[['business_process_reason4','base_pay_change4','effective_date4','pay_rate_type4','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason4':'business_process_reason','base_pay_change4':'base_pay_change','effective_date4':'effective_date','pay_rate_type4':'pay_rate_type'})
reason_for_change29 = df2[['business_process_reason5','base_pay_change5','effective_date5','pay_rate_type5','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason5':'business_process_reason','base_pay_change5':'base_pay_change','effective_date5':'effective_date','pay_rate_type5':'pay_rate_type'})
reason_for_change30 = df2[['business_process_reason6','base_pay_change6','effective_date6','pay_rate_type6','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason6':'business_process_reason','base_pay_change6':'base_pay_change','effective_date6':'effective_date','pay_rate_type6':'pay_rate_type'})
reason_for_change31 = df2[['business_process_reason7','base_pay_change7','effective_date7','pay_rate_type7','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason7':'business_process_reason','base_pay_change7':'base_pay_change','effective_date7':'effective_date','pay_rate_type7':'pay_rate_type'})
reason_for_change32 = df2[['business_process_reason8','base_pay_change8','effective_date8','pay_rate_type8','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason8':'business_process_reason','base_pay_change8':'base_pay_change','effective_date8':'effective_date','pay_rate_type8':'pay_rate_type'})
reason_for_change33 = df2[['business_process_reason9','base_pay_change9','effective_date9','pay_rate_type9','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason9':'business_process_reason','base_pay_change9':'base_pay_change','effective_date9':'effective_date','pay_rate_type9':'pay_rate_type'})
reason_for_change34 = df2[['business_process_reason10','base_pay_change10','effective_date10','pay_rate_type10','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason10':'business_process_reason','base_pay_change10':'base_pay_change','effective_date10':'effective_date','pay_rate_type10':'pay_rate_type'})
reason_for_change35 = df2[['business_process_reason11','base_pay_change11','effective_date11','pay_rate_type11','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason11':'business_process_reason','base_pay_change11':'base_pay_change','effective_date11':'effective_date','pay_rate_type11':'pay_rate_type'})
reason_for_change36 = df2[['business_process_reason12','base_pay_change12','effective_date12','pay_rate_type12','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason12':'business_process_reason','base_pay_change12':'base_pay_change','effective_date12':'effective_date','pay_rate_type12':'pay_rate_type'})
reason_for_change37 = df2[['business_process_reason13','base_pay_change13','effective_date13','pay_rate_type13','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason13':'business_process_reason','base_pay_change13':'base_pay_change','effective_date13':'effective_date','pay_rate_type13':'pay_rate_type'})
reason_for_change38 = df2[['business_process_reason14','base_pay_change14','effective_date14','pay_rate_type14','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason14':'business_process_reason','base_pay_change14':'base_pay_change','effective_date14':'effective_date','pay_rate_type14':'pay_rate_type'})
reason_for_change39 = df2[['business_process_reason15','base_pay_change15','effective_date15','pay_rate_type15','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2008_annual_performance_rating','2009_annual_performance_rating','2010_annual_performance_rating','2011_annual_performance_rating','2012_annual_performance_rating','2013_annual_performance_rating','2014_annual_performance_rating','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','2019_annual_performance_rating','2020_annual_performance_rating']].rename(columns={'business_process_reason15':'business_process_reason','base_pay_change15':'base_pay_change','effective_date15':'effective_date','pay_rate_type15':'pay_rate_type'})
reason_for_change1 = pd.DataFrame(reason_for_change1)
reason_for_change2 = pd.DataFrame(reason_for_change2)
reason_for_change3 = pd.DataFrame(reason_for_change3)
reason_for_change4 = pd.DataFrame(reason_for_change4)
reason_for_change5 = pd.DataFrame(reason_for_change5)
reason_for_change6 = pd.DataFrame(reason_for_change6)
reason_for_change7 = pd.DataFrame(reason_for_change7)
reason_for_change8 = pd.DataFrame(reason_for_change8)
reason_for_change9 = pd.DataFrame(reason_for_change9)
reason_for_change10 = pd.DataFrame(reason_for_change10)
reason_for_change11 = pd.DataFrame(reason_for_change11)
reason_for_change12 = pd.DataFrame(reason_for_change12)
reason_for_change13 = pd.DataFrame(reason_for_change13)
reason_for_change14 = pd.DataFrame(reason_for_change14)
reason_for_change15 = pd.DataFrame(reason_for_change15)
reason_for_change16 = pd.DataFrame(reason_for_change16)
reason_for_change17 = pd.DataFrame(reason_for_change17)
reason_for_change18 = pd.DataFrame(reason_for_change18)
reason_for_change19 = pd.DataFrame(reason_for_change19)
reason_for_change20 = pd.DataFrame(reason_for_change20)
reason_for_change21 = pd.DataFrame(reason_for_change21)
reason_for_change22 = pd.DataFrame(reason_for_change22)
reason_for_change23 = pd.DataFrame(reason_for_change23)
reason_for_change24 = pd.DataFrame(reason_for_change24)
reason_for_change25 = pd.DataFrame(reason_for_change25)
reason_for_change26 = pd.DataFrame(reason_for_change26)
reason_for_change27 = pd.DataFrame(reason_for_change27)
reason_for_change28 = pd.DataFrame(reason_for_change28)
reason_for_change29 = pd.DataFrame(reason_for_change29)
reason_for_change30 = pd.DataFrame(reason_for_change30)
reason_for_change31 = pd.DataFrame(reason_for_change31)
reason_for_change32 = pd.DataFrame(reason_for_change32)
reason_for_change33 = pd.DataFrame(reason_for_change33)
reason_for_change34 = pd.DataFrame(reason_for_change34)
reason_for_change35 = pd.DataFrame(reason_for_change35)
reason_for_change36 = pd.DataFrame(reason_for_change36)
reason_for_change37 = pd.DataFrame(reason_for_change37)
reason_for_change38 = pd.DataFrame(reason_for_change38)
reason_for_change39 = pd.DataFrame(reason_for_change39)
reason_for_change_combined = pd.concat([reason_for_change1,reason_for_change2,reason_for_change3,reason_for_change4,reason_for_change5,reason_for_change6,reason_for_change7,reason_for_change8,reason_for_change9,reason_for_change10,reason_for_change11,reason_for_change12,reason_for_change13,reason_for_change14,reason_for_change15,reason_for_change16,reason_for_change17,reason_for_change18,reason_for_change19,reason_for_change20,reason_for_change21,reason_for_change22,reason_for_change23,reason_for_change24,reason_for_change25,reason_for_change26,reason_for_change27,reason_for_change28,reason_for_change29,reason_for_change30,reason_for_change31,reason_for_change32,reason_for_change33,reason_for_change34,reason_for_change35,reason_for_change36,reason_for_change37,reason_for_change38,reason_for_change39])
eight1 = df[['2008_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2008_annual_performance_rating':'performance_rating'})
eight2 = df2[['2008_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2008_annual_performance_rating':'performance_rating'})
nine1 = df[['2009_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2009_annual_performance_rating':'performance_rating'})
nine2 = df2[['2009_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2009_annual_performance_rating':'performance_rating'})
ten1 = df[['2010_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2010_annual_performance_rating':'performance_rating'})
ten2 = df2[['2010_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2010_annual_performance_rating':'performance_rating'})
eleven1 = df[['2011_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2011_annual_performance_rating':'performance_rating'})
eleven2 = df2[['2011_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2011_annual_performance_rating':'performance_rating'})
twelve1 = df[['2012_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2012_annual_performance_rating':'performance_rating'})
twelve2 = df2[['2012_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2012_annual_performance_rating':'performance_rating'})
thirteen1 = df[['2013_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2013_annual_performance_rating':'performance_rating'})
thirteen2 = df2[['2013_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2013_annual_performance_rating':'performance_rating'})
fourteen1 = df[['2014_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2014_annual_performance_rating':'performance_rating'})
fourteen2 = df2[['2014_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2014_annual_performance_rating':'performance_rating'})
fifteen1 = df[['2015_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2015_annual_performance_rating':'performance_rating'})
fifteen2 = df2[['2015_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2015_annual_performance_rating':'performance_rating'})
sixteen1 = df[['2016_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2016_annual_performance_rating':'performance_rating'})
sixteen2 = df2[['2016_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2016_annual_performance_rating':'performance_rating'})
seventeen1 = df[['2017_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2017_annual_performance_rating':'performance_rating'})
seventeen2 = df2[['2017_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2017_annual_performance_rating':'performance_rating'})
eighteen1 = df[['2018_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2018_annual_performance_rating':'performance_rating'})
eighteen2 = df2[['2018_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2018_annual_performance_rating':'performance_rating'})
nineteen1 = df[['2019_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2019_annual_performance_rating':'performance_rating'})
nineteen2 = df2[['2019_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2019_annual_performance_rating':'performance_rating'})
twenty1 = df[['2020_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2020_annual_performance_rating':'performance_rating'})
twenty2 = df2[['2020_annual_performance_rating','gender','race_ethnicity','race_grouping','dept']].rename(columns={'2020_annual_performance_rating':'performance_rating'})
eight1 = pd.DataFrame(eight1)
eight2 = pd.DataFrame(eight2)
nine1 = pd.DataFrame(nine1)
nine2 = pd.DataFrame(nine2)
ten1 = pd.DataFrame(ten1)
ten2 = pd.DataFrame(ten2)
eleven1 = pd.DataFrame(eleven1)
eleven2 = pd.DataFrame(eleven2)
twelve1 = pd.DataFrame(twelve1)
twelve2 = pd.DataFrame(twelve2)
thirteen1 = pd.DataFrame(thirteen1)
thirteen2 = pd.DataFrame(thirteen2)
fourteen1 = pd.DataFrame(fourteen1)
fourteen2 = pd.DataFrame(fourteen2)
fifteen1 = pd.DataFrame(fifteen1)
fifteen2 = pd.DataFrame(fifteen2)
sixteen1 = pd.DataFrame(sixteen1)
sixteen2 = pd.DataFrame(sixteen2)
seventeen1 = pd.DataFrame(seventeen1)
seventeen2 = pd.DataFrame(seventeen2)
eighteen1 = pd.DataFrame(eighteen1)
eighteen2 = pd.DataFrame(eighteen2)
nineteen1 = pd.DataFrame(nineteen1)
nineteen2 = pd.DataFrame(nineteen2)
twenty1 = pd.DataFrame(twenty1)
twenty2 = pd.DataFrame(twenty2)
ratings_combined = pd.concat([eight1,eight2,nine1,nine2,ten1,ten2,eleven1,eleven2,twelve1,twelve2,thirteen1,thirteen2,fourteen1,fourteen2,fifteen1,fifteen2,sixteen1,sixteen2,seventeen1,seventeen2,eighteen1,eighteen2,nineteen1,nineteen2,twenty1,twenty2])
news_salaried = df[(df['dept'] == 'News') & (df['pay_rate_type'] == 'Salaried')]
news_hourly = df[(df['dept'] == 'News') & (df['pay_rate_type'] == 'Hourly')]
commercial_salaried = df[(df['dept'] == 'Commercial') & (df['pay_rate_type'] == 'Salaried')]
commercial_hourly = df[(df['dept'] == 'Commercial') & (df['pay_rate_type'] == 'Hourly')]
news_salaried2 = df2[(df2['dept'] == 'News') & (df2['pay_rate_type'] == 'Salaried')]
news_hourly2 = df2[(df2['dept'] == 'News') & (df2['pay_rate_type'] == 'Hourly')]
commercial_salaried2 = df2[(df2['dept'] == 'Commercial') & (df2['pay_rate_type'] == 'Salaried')]
commercial_hourly2 = df2[(df2['dept'] == 'Commercial') & (df2['pay_rate_type'] == 'Hourly')]
df['count'] = 1
df2['count'] = 1
def suppress(results):
results.columns = results.columns.get_level_values(1)
return results[results['count_nonzero'] >= 5]
def suppress_count(results):
results.columns = results.columns.get_level_values(1)
return results[results['count_nonzero'] >= 5].sort_values('count_nonzero', ascending=False)
def suppress_median(results):
results.columns = results.columns.get_level_values(1)
return results[results['count_nonzero'] >= 5].sort_values('median', ascending=False)
current_employee_count = df.shape[0]
terminated_employee_count = df2.shape[0]
print('Total employees in data: ' + str(current_employee_count + terminated_employee_count))
print('Current employees: ' + str(current_employee_count))
print('Terminated employees: ' + str(terminated_employee_count))
Total employees in data: 1466 Current employees: 1003 Terminated employees: 463
current_salaried_employee_count = df[df['pay_rate_type'] == 'Salaried'].shape[0]
terminated_salaried_employee_count = df2[df2['pay_rate_type'] == 'Salaried'].shape[0]
print('Total salaried employees in data: ' + str(current_salaried_employee_count + terminated_salaried_employee_count))
print('Current salaried employees: ' + str(current_salaried_employee_count))
print('Terminated salaried employees: ' + str(terminated_salaried_employee_count))
Total salaried employees in data: 1049 Current salaried employees: 783 Terminated salaried employees: 266
current_hourly_employee_count = df[df['pay_rate_type'] == 'Hourly'].shape[0]
terminated_hourly_employee_count = df2[df2['pay_rate_type'] == 'Hourly'].shape[0]
print('Total hourly employees in data: ' + str(current_hourly_employee_count + terminated_hourly_employee_count))
print('Current hourly employees: ' + str(current_hourly_employee_count))
print('Terminated hourly employees: ' + str(terminated_hourly_employee_count))
Total hourly employees in data: 417 Current hourly employees: 220 Terminated hourly employees: 197
current_mean_salary = df[df['pay_rate_type'] == 'Salaried']['current_base_pay'].mean()
current_median_salary = df[df['pay_rate_type'] == 'Salaried']['current_base_pay'].median()
print('The mean yearly pay for current salaried employees is $' + str(current_mean_salary) + '.')
print('The median yearly pay for current salaried employees is $' + str(current_median_salary) + '.')
The mean yearly pay for current salaried employees is $118500.28141762452. The median yearly pay for current salaried employees is $105298.56.
current_mean_hourly = df[df['pay_rate_type'] == 'Hourly']['current_base_pay'].mean()
current_median_hourly = df[df['pay_rate_type'] == 'Hourly']['current_base_pay'].median()
print('The mean rate for current hourly employees at The Washington Post is $' + str(current_mean_hourly) + '.')
print('The median rate for current hourly employees at The Washington Post is $' + str(current_median_hourly) + '.')
The mean rate for current hourly employees at The Washington Post is $31.94181818181818. The median rate for current hourly employees at The Washington Post is $31.16.
current_employee_gender = df.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_employee_gender)
count_nonzero | |
---|---|
gender | |
Female | 546 |
Male | 456 |
terminated_employee_gender = df2.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(terminated_employee_gender)
count_nonzero | |
---|---|
gender | |
Female | 261 |
Male | 201 |
current_median_salary_gender = df[df['pay_rate_type'] == 'Salaried'].groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_salary_gender)
count_nonzero | median | |
---|---|---|
gender | ||
Female | 421 | 99465.35 |
Male | 361 | 114921.16 |
current_median_hourly_gender = df[df['pay_rate_type'] == 'Hourly'].groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_hourly_gender)
count_nonzero | median | |
---|---|---|
gender | ||
Female | 125 | 32.65 |
Male | 95 | 28.65 |
current_age_gender_salaried = df[df['pay_rate_type'] == 'Salaried'].groupby(['gender'])['age'].median().sort_values(ascending=False)
current_age_gender_salaried
gender Male 42.00 Female 35.00 Prefer not to disclose 30.00 Name: age, dtype: float64
current_employee_race_ethnicity = df.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_employee_race_ethnicity)
count_nonzero | |
---|---|
race_ethnicity | |
White (United States of America) | 624 |
Black or African American (United States of America) | 159 |
Asian (United States of America) | 91 |
Hispanic or Latino (United States of America) | 54 |
Two or More Races (United States of America) | 28 |
Prefer Not to Disclose (United States of America) | 15 |
terminated_employee_race_ethnicity = df2.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(terminated_employee_race_ethnicity)
count_nonzero | |
---|---|
race_ethnicity | |
White (United States of America) | 252 |
Black or African American (United States of America) | 119 |
Asian (United States of America) | 45 |
Hispanic or Latino (United States of America) | 23 |
Two or More Races (United States of America) | 13 |
Prefer Not to Disclose (United States of America) | 6 |
current_median_salary_race = df[df['pay_rate_type'] == 'Salaried'].groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_median_salary_race)
count_nonzero | median | |
---|---|---|
race_ethnicity | ||
White (United States of America) | 525 | 110453.45 |
Black or African American (United States of America) | 76 | 95000.00 |
Prefer Not to Disclose (United States of America) | 13 | 95000.00 |
Asian (United States of America) | 72 | 94920.00 |
Hispanic or Latino (United States of America) | 47 | 94780.00 |
Two or More Races (United States of America) | 21 | 90000.00 |
current_median_hourly_race = df[df['pay_rate_type'] == 'Hourly'].groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_median_hourly_race)
count_nonzero | median | |
---|---|---|
race_ethnicity | ||
Two or More Races (United States of America) | 7 | 35.00 |
White (United States of America) | 99 | 33.77 |
Asian (United States of America) | 19 | 31.92 |
Black or African American (United States of America) | 83 | 27.88 |
Hispanic or Latino (United States of America) | 7 | 26.56 |
current_age_race_salaried = df[df['pay_rate_type'] == 'Salaried'].groupby(['race_ethnicity'])['age'].median().sort_values(ascending=False)
current_age_race_salaried
race_ethnicity American Indian or Alaska Native (United States of America) 46.00 Native Hawaiian or Other Pacific Islander (United States of America) 45.00 Black or African American (United States of America) 40.50 White (United States of America) 39.00 Hispanic or Latino (United States of America) 35.00 Asian (United States of America) 33.00 Prefer Not to Disclose (United States of America) 31.00 Two or More Races (United States of America) 29.00 Name: age, dtype: float64
current_age_race_hourly = df[df['pay_rate_type'] == 'Hourly'].groupby(['race_ethnicity'])['age'].median().sort_values(ascending=False)
current_age_race_hourly
race_ethnicity American Indian or Alaska Native (United States of America) 55.50 Black or African American (United States of America) 47.00 White (United States of America) 38.00 Prefer Not to Disclose (United States of America) 34.00 Two or More Races (United States of America) 32.00 Hispanic or Latino (United States of America) 29.00 Asian (United States of America) 28.00 Name: age, dtype: float64
current_employee_race_gender = df.groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_employee_race_gender)
count_nonzero | ||
---|---|---|
race_ethnicity | gender | |
Asian (United States of America) | Female | 64 |
Male | 27 | |
Black or African American (United States of America) | Female | 84 |
Male | 75 | |
Hispanic or Latino (United States of America) | Female | 29 |
Male | 25 | |
Prefer Not to Disclose (United States of America) | Female | 6 |
Male | 9 | |
Two or More Races (United States of America) | Female | 20 |
Male | 8 | |
White (United States of America) | Female | 325 |
Male | 298 |
current_salaried_race_gender = df[df['pay_rate_type'] == 'Salaried'].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_salaried_race_gender)
count_nonzero | ||
---|---|---|
race_ethnicity | gender | |
Asian (United States of America) | Female | 51 |
Male | 21 | |
Black or African American (United States of America) | Female | 40 |
Male | 36 | |
Hispanic or Latino (United States of America) | Female | 24 |
Male | 23 | |
Prefer Not to Disclose (United States of America) | Female | 6 |
Male | 7 | |
Two or More Races (United States of America) | Female | 15 |
Male | 6 | |
White (United States of America) | Female | 269 |
Male | 255 |
current_hourly_race_gender = df[df['pay_rate_type'] == 'Hourly'].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_hourly_race_gender)
count_nonzero | ||
---|---|---|
race_ethnicity | gender | |
Asian (United States of America) | Female | 13 |
Male | 6 | |
Black or African American (United States of America) | Female | 44 |
Male | 39 | |
Hispanic or Latino (United States of America) | Female | 5 |
Two or More Races (United States of America) | Female | 5 |
White (United States of America) | Female | 56 |
Male | 43 |
current_median_salary_race_gender = df[df['pay_rate_type'] == 'Salaried'].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_salary_race_gender)
count_nonzero | median | ||
---|---|---|---|
race_ethnicity | gender | ||
Asian (United States of America) | Female | 51 | 94840.00 |
Male | 21 | 95691.99 | |
Black or African American (United States of America) | Female | 40 | 91959.90 |
Male | 36 | 99375.00 | |
Hispanic or Latino (United States of America) | Female | 24 | 91254.94 |
Male | 23 | 98411.56 | |
Prefer Not to Disclose (United States of America) | Female | 6 | 92500.00 |
Male | 7 | 98340.00 | |
Two or More Races (United States of America) | Female | 15 | 90780.00 |
Male | 6 | 86890.00 | |
White (United States of America) | Female | 269 | 104780.00 |
Male | 255 | 117131.00 |
current_median_hourly_race_gender = df[df['pay_rate_type'] == 'Hourly'].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_hourly_race_gender)
count_nonzero | median | ||
---|---|---|---|
race_ethnicity | gender | ||
Asian (United States of America) | Female | 13 | 31.92 |
Male | 6 | 32.83 | |
Black or African American (United States of America) | Female | 44 | 29.42 |
Male | 39 | 26.66 | |
Hispanic or Latino (United States of America) | Female | 5 | 33.85 |
Two or More Races (United States of America) | Female | 5 | 35.90 |
White (United States of America) | Female | 56 | 35.90 |
Male | 43 | 32.12 |
current_employee_age_5 = df.groupby(['age_group_5']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_employee_age_5)
count_nonzero | |
---|---|
age_group_5 | |
<25 | 43 |
25-29 | 193 |
30-34 | 147 |
35-39 | 150 |
40-44 | 94 |
45-49 | 85 |
50-54 | 92 |
55-59 | 87 |
60-64 | 70 |
65+ | 42 |
terminated_employee_age_5 = df2.groupby(['age_group_5']).agg({'current_base_pay': [np.count_nonzero]})
suppress(terminated_employee_age_5)
count_nonzero | |
---|---|
age_group_5 | |
<25 | 8 |
25-29 | 89 |
30-34 | 110 |
35-39 | 64 |
40-44 | 43 |
45-49 | 25 |
50-54 | 25 |
55-59 | 37 |
60-64 | 21 |
65+ | 40 |
current_employee_age_10 = df.groupby(['age_group_10']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_employee_age_10)
count_nonzero | |
---|---|
age_group_10 | |
<25 | 43 |
25-34 | 340 |
35-44 | 244 |
45-54 | 177 |
55-64 | 157 |
65+ | 42 |
terminated_employee_age_10 = df2.groupby(['age_group_10']).agg({'current_base_pay': [np.count_nonzero]})
suppress(terminated_employee_age_10)
count_nonzero | |
---|---|
age_group_10 | |
<25 | 8 |
25-34 | 199 |
35-44 | 107 |
45-54 | 50 |
55-64 | 58 |
65+ | 40 |
current_median_salary_age_5 = df[df['pay_rate_type'] == 'Salaried'].groupby(['age_group_5']).agg({'current_base_pay': [np.median, np.count_nonzero]})
suppress(current_median_salary_age_5)
median | count_nonzero | |
---|---|---|
age_group_5 | ||
<25 | 72250.00 | 22 |
25-29 | 83389.64 | 150 |
30-34 | 95104.86 | 120 |
35-39 | 115000.00 | 137 |
40-44 | 128280.00 | 77 |
45-49 | 126572.33 | 63 |
50-54 | 117924.49 | 77 |
55-59 | 117935.68 | 61 |
60-64 | 145686.55 | 52 |
65+ | 120304.93 | 24 |
current_median_hourly_age_5 = df[df['pay_rate_type'] == 'Hourly'].groupby(['age_group_5']).agg({'current_base_pay': [np.median, np.count_nonzero]})
suppress(current_median_hourly_age_5)
median | count_nonzero | |
---|---|---|
age_group_5 | ||
<25 | 31.79 | 21 |
25-29 | 30.14 | 43 |
30-34 | 33.33 | 27 |
35-39 | 33.08 | 13 |
40-44 | 28.74 | 17 |
45-49 | 32.27 | 22 |
50-54 | 28.61 | 15 |
55-59 | 29.70 | 26 |
60-64 | 27.68 | 18 |
65+ | 31.26 | 18 |
current_median_salary_age_10 = df[df['pay_rate_type'] == 'Salaried'].groupby(['age_group_10']).agg({'current_base_pay': [np.median, np.count_nonzero]})
suppress(current_median_salary_age_10)
median | count_nonzero | |
---|---|---|
age_group_10 | ||
<25 | 72250.00 | 22 |
25-34 | 89980.00 | 270 |
35-44 | 117780.00 | 214 |
45-54 | 124398.47 | 140 |
55-64 | 129999.56 | 113 |
65+ | 120304.93 | 24 |
current_median_hourly_age_10 = df[df['pay_rate_type'] == 'Hourly'].groupby(['age_group_10']).agg({'current_base_pay': [np.median, np.count_nonzero]})
suppress(current_median_hourly_age_10)
median | count_nonzero | |
---|---|---|
age_group_10 | ||
<25 | 31.79 | 21 |
25-34 | 32.23 | 70 |
35-44 | 31.36 | 30 |
45-54 | 31.06 | 37 |
55-64 | 29.34 | 44 |
65+ | 31.26 | 18 |
current_employee_dept = df.groupby(['dept']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_employee_dept)
count_nonzero | |
---|---|
dept | |
News | 740 |
Commercial | 263 |
current_employee_department = df.groupby(['department']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_employee_department)
count_nonzero | |
---|---|
department | |
News | 699 |
Client Solutions | 139 |
Editorial | 41 |
Production | 38 |
Finance | 32 |
Audience Development and Insights | 18 |
Customer Care and Logistics | 12 |
Washington Post Live | 9 |
Marketing | 7 |
Public Relations | 7 |
current_employee_dept_salary = df[df['pay_rate_type'] == 'Salaried'].groupby(['dept']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_employee_dept_salary)
count_nonzero | median | |
---|---|---|
dept | ||
News | 657 | 109884.02 |
Commercial | 126 | 90000.00 |
current_employee_department_salary = df[df['pay_rate_type'] == 'Salaried'].groupby(['department']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_employee_department_salary)
count_nonzero | median | |
---|---|---|
department | ||
Editorial | 33 | 128560.27 |
News | 624 | 109577.50 |
Finance | 9 | 95000.00 |
Client Solutions | 91 | 90241.58 |
Audience Development and Insights | 18 | 90000.00 |
Production | 5 | 75000.51 |
current_employee_dept_hourly = df[df['pay_rate_type'] == 'Hourly'].groupby(['dept']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_employee_dept_hourly)
count_nonzero | median | |
---|---|---|
dept | ||
News | 83 | 34.28 |
Commercial | 137 | 29.18 |
current_employee_department_hourly = df[df['pay_rate_type'] == 'Hourly'].groupby(['department']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_employee_department_hourly)
count_nonzero | median | |
---|---|---|
department | ||
Editorial | 8 | 43.27 |
Marketing | 7 | 39.64 |
Public Relations | 7 | 38.40 |
News | 75 | 33.85 |
Washington Post Live | 7 | 33.33 |
Client Solutions | 48 | 31.23 |
Finance | 23 | 30.26 |
Production | 33 | 25.41 |
Customer Care and Logistics | 12 | 21.67 |
current_employee_desk = df.groupby(['desk']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_employee_desk)
count_nonzero | |
---|---|
desk | |
non-newsroom | 281 |
National | 125 |
Local | 72 |
Sports | 59 |
Style | 56 |
Multiplatform | 51 |
Video | 49 |
Financial | 43 |
Editorial | 41 |
Foreign | 34 |
Design | 34 |
Audience Development and Engagement | 32 |
Photography | 27 |
Emerging News Products | 26 |
Graphics | 21 |
Universal Desk | 14 |
Audio | 12 |
Operations | 10 |
Outlook | 8 |
current_employee_cost_center = df.groupby(['cost_center_current']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_employee_cost_center)
count_nonzero | |
---|---|
cost_center_current | |
112300 Local Politics and Government | 72 |
113200 National Politics and Government | 61 |
110015 Sports Main | 59 |
110601 Multiplatform Desk | 51 |
110652 News Video - General | 49 |
113210 Economy and Business | 43 |
110300 Style | 42 |
115000 Editorial Administration | 41 |
110604 Presentation Design | 34 |
119065 Dispatch Operations (Night Circulation) | 32 |
110610 Audience Development and Engagement | 32 |
110605 Presentation | 27 |
110603 Presentation Graphics | 21 |
126020 Revenue Administration | 19 |
113205 National Security | 18 |
117682 Global Sales | 18 |
116010 Research | 18 |
110450 Investigative | 17 |
117693 Digital Ad Sales - Planning | 16 |
113215 News National Health & Science | 16 |
117694 Digital Ad Sales - BrandStudio | 15 |
113235 National America | 15 |
110600 Universal Desk | 14 |
110664 News National Apps | 12 |
110620 News Audio | 12 |
114000 Foreign Administration | 10 |
117525 National Retailers | 10 |
117004 Advertising Marketing | 10 |
110000 News Operations | 10 |
117720 Health | 8 |
118150 WP Live | 8 |
113240 News National Environment | 8 |
117005 Creative Services | 7 |
129100 Community | 7 |
110666 News Snapchat | 6 |
120005 Makeup | 6 |
126010 General Ledger | 6 |
110435 Food | 6 |
117405 Jobs Tactical | 5 |
117310 Consumer to Consumer Team I | 5 |
117900 Agency Partner | 5 |
110500 Magazine | 5 |
119026 Customer Contact Center | 5 |
110460 Outlook | 5 |
126060 Circulation Accounting | 5 |
128150 Consumer Mktg - Digital Subscription | 5 |
current_employee_desk_salary = df[df['pay_rate_type'] == 'Salaried'].groupby(['desk']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_employee_desk_salary)
count_nonzero | median | |
---|---|---|
desk | ||
National | 114 | 158242.11 |
Foreign | 33 | 142780.00 |
Financial | 43 | 137140.00 |
Editorial | 33 | 128560.27 |
Local | 68 | 113140.00 |
Style | 50 | 111833.47 |
Universal Desk | 6 | 107876.01 |
Sports | 43 | 107560.00 |
Outlook | 6 | 105497.50 |
Graphics | 21 | 104320.14 |
Photography | 27 | 98340.00 |
non-newsroom | 144 | 94825.00 |
Video | 46 | 94420.00 |
Audio | 8 | 93140.00 |
Multiplatform | 44 | 90810.00 |
Audience Development and Engagement | 28 | 89080.00 |
Design | 33 | 88000.00 |
Emerging News Products | 26 | 75000.00 |
current_employee_cost_center_salary = df[df['pay_rate_type'] == 'Salaried'].groupby(['cost_center_current']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_employee_cost_center_salary)
count_nonzero | median | |
---|---|---|
cost_center_current | ||
113205 National Security | 18 | 180737.50 |
117682 Global Sales | 17 | 174956.00 |
113215 News National Health & Science | 14 | 164322.11 |
113200 National Politics and Government | 55 | 157840.00 |
110450 Investigative | 17 | 144340.00 |
113210 Economy and Business | 43 | 137140.00 |
113240 News National Environment | 8 | 133387.50 |
117900 Agency Partner | 5 | 129999.56 |
113235 National America | 15 | 129990.00 |
115000 Editorial Administration | 33 | 128560.27 |
114000 Foreign Administration | 9 | 125246.54 |
110300 Style | 39 | 114071.94 |
112300 Local Politics and Government | 68 | 113140.00 |
110435 Food | 5 | 108618.00 |
110600 Universal Desk | 6 | 107876.01 |
110015 Sports Main | 43 | 107560.00 |
117525 National Retailers | 8 | 106385.36 |
110603 Presentation Graphics | 21 | 104320.14 |
110460 Outlook | 5 | 98435.00 |
110605 Presentation | 27 | 98340.00 |
117694 Digital Ad Sales - BrandStudio | 13 | 96975.00 |
126010 General Ledger | 6 | 95585.00 |
110652 News Video - General | 46 | 94420.00 |
110620 News Audio | 8 | 93140.00 |
117720 Health | 8 | 92703.74 |
110601 Multiplatform Desk | 44 | 90810.00 |
116010 Research | 18 | 90000.00 |
110610 Audience Development and Engagement | 28 | 89080.00 |
110604 Presentation Design | 33 | 88000.00 |
117004 Advertising Marketing | 7 | 85045.10 |
117005 Creative Services | 6 | 78229.70 |
120005 Makeup | 5 | 75000.51 |
117693 Digital Ad Sales - Planning | 16 | 74790.30 |
110666 News Snapchat | 6 | 72720.00 |
110664 News National Apps | 12 | 71926.50 |
current_employee_desk_hourly = df[df['pay_rate_type'] == 'Hourly'].groupby(['desk']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_employee_desk_hourly)
count_nonzero | median | |
---|---|---|
desk | ||
Universal Desk | 8 | 44.16 |
Editorial | 8 | 43.27 |
National | 11 | 33.85 |
Multiplatform | 7 | 32.77 |
non-newsroom | 137 | 29.18 |
Sports | 16 | 27.74 |
Operations | 8 | 22.68 |
Style | 6 | 20.95 |
current_employee_cost_center_hourly = df[df['pay_rate_type'] == 'Hourly'].groupby(['cost_center_current']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_employee_cost_center_hourly)
count_nonzero | median | |
---|---|---|
cost_center_current | ||
110600 Universal Desk | 8 | 44.16 |
115000 Editorial Administration | 8 | 43.27 |
128150 Consumer Mktg - Digital Subscription | 5 | 39.64 |
129100 Community | 7 | 38.40 |
118150 WP Live | 6 | 34.10 |
113200 National Politics and Government | 6 | 33.59 |
110601 Multiplatform Desk | 7 | 32.77 |
126020 Revenue Administration | 19 | 30.14 |
117310 Consumer to Consumer Team I | 5 | 28.61 |
110015 Sports Main | 16 | 27.74 |
117405 Jobs Tactical | 5 | 25.50 |
119065 Dispatch Operations (Night Circulation) | 32 | 25.32 |
110000 News Operations | 8 | 22.68 |
119026 Customer Contact Center | 5 | 21.71 |
current_employee_yos = df.groupby(['years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_employee_yos)
count_nonzero | |
---|---|
years_of_service_grouped | |
0 | 99 |
1-2 | 234 |
3-5 | 213 |
6-10 | 161 |
11-15 | 77 |
16-20 | 79 |
21-25 | 78 |
25+ | 62 |
terminated_employee_yos = df2.groupby(['years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero]})
suppress(terminated_employee_yos)
count_nonzero | |
---|---|
years_of_service_grouped | |
1-2 | 50 |
3-5 | 181 |
6-10 | 128 |
11-15 | 32 |
16-20 | 29 |
21-25 | 22 |
25+ | 18 |
current_employee_yos_salary = df[df['pay_rate_type'] == 'Salaried'].groupby(['years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_employee_yos_salary)
count_nonzero | median | |
---|---|---|
years_of_service_grouped | ||
0 | 70 | 90000.00 |
1-2 | 182 | 90260.79 |
3-5 | 178 | 100780.00 |
6-10 | 135 | 115162.62 |
11-15 | 55 | 116560.00 |
16-20 | 58 | 118372.75 |
21-25 | 60 | 135024.69 |
25+ | 45 | 131075.90 |
current_employee_yos_hourly = df[df['pay_rate_type'] == 'Hourly'].groupby(['years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_employee_yos_hourly)
count_nonzero | median | |
---|---|---|
years_of_service_grouped | ||
0 | 29 | 34.87 |
1-2 | 52 | 28.81 |
3-5 | 35 | 29.66 |
6-10 | 26 | 31.03 |
11-15 | 22 | 31.63 |
16-20 | 21 | 31.92 |
21-25 | 18 | 32.27 |
25+ | 17 | 32.21 |
current_employee_yos_gender = df.groupby(['years_of_service_grouped','gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_employee_yos_gender)
count_nonzero | ||
---|---|---|
years_of_service_grouped | gender | |
0 | Female | 68.00 |
Male | 31.00 | |
1-2 | Female | 145.00 |
Male | 88.00 | |
3-5 | Female | 114.00 |
Male | 99.00 | |
6-10 | Female | 72.00 |
Male | 89.00 | |
11-15 | Female | 32.00 |
Male | 45.00 | |
16-20 | Female | 46.00 |
Male | 33.00 | |
21-25 | Female | 36.00 |
Male | 42.00 | |
25+ | Female | 33.00 |
Male | 29.00 |
current_employee_yos_gender_salary = df[df['pay_rate_type'] == 'Salaried'].groupby(['years_of_service_grouped','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_employee_yos_gender_salary)
count_nonzero | median | ||
---|---|---|---|
years_of_service_grouped | gender | ||
0 | Female | 46.00 | 89000.00 |
Male | 24.00 | 93500.00 | |
1-2 | Female | 116.00 | 88321.30 |
Male | 65.00 | 95000.00 | |
3-5 | Female | 97.00 | 95000.00 |
Male | 81.00 | 107855.60 | |
6-10 | Female | 64.00 | 111074.25 |
Male | 71.00 | 123340.00 | |
11-15 | Female | 25.00 | 104618.47 |
Male | 30.00 | 134800.96 | |
16-20 | Female | 33.00 | 113809.68 |
Male | 25.00 | 129828.93 | |
21-25 | Female | 23.00 | 143908.85 |
Male | 37.00 | 129951.95 | |
25+ | Female | 17.00 | 126937.16 |
Male | 28.00 | 132556.04 |
current_employee_yos_gender_hourly = df[df['pay_rate_type'] == 'Hourly'].groupby(['years_of_service_grouped','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_employee_yos_gender_hourly)
count_nonzero | median | ||
---|---|---|---|
years_of_service_grouped | gender | ||
0 | Female | 22 | 35.45 |
Male | 7 | 29.23 | |
1-2 | Female | 29 | 30.14 |
Male | 23 | 23.83 | |
3-5 | Female | 17 | 34.61 |
Male | 18 | 26.37 | |
6-10 | Female | 8 | 43.36 |
Male | 18 | 28.12 | |
11-15 | Female | 7 | 32.51 |
Male | 15 | 28.96 | |
16-20 | Female | 13 | 31.92 |
Male | 8 | 31.69 | |
21-25 | Female | 13 | 29.74 |
Male | 5 | 34.77 | |
25+ | Female | 16 | 32.84 |
current_employee_yos_race = df.groupby(['years_of_service_grouped','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_employee_yos_race)
count_nonzero | ||
---|---|---|
years_of_service_grouped | race_ethnicity | |
0 | Asian (United States of America) | 13.00 |
Black or African American (United States of America) | 22.00 | |
Hispanic or Latino (United States of America) | 12.00 | |
Two or More Races (United States of America) | 8.00 | |
White (United States of America) | 39.00 | |
1-2 | Asian (United States of America) | 31.00 |
Black or African American (United States of America) | 30.00 | |
Hispanic or Latino (United States of America) | 14.00 | |
Prefer Not to Disclose (United States of America) | 9.00 | |
Two or More Races (United States of America) | 10.00 | |
White (United States of America) | 133.00 | |
3-5 | Asian (United States of America) | 16.00 |
Black or African American (United States of America) | 24.00 | |
Hispanic or Latino (United States of America) | 14.00 | |
Prefer Not to Disclose (United States of America) | 5.00 | |
Two or More Races (United States of America) | 8.00 | |
White (United States of America) | 142.00 | |
6-10 | Asian (United States of America) | 12.00 |
Black or African American (United States of America) | 20.00 | |
Hispanic or Latino (United States of America) | 10.00 | |
White (United States of America) | 108.00 | |
11-15 | Asian (United States of America) | 5.00 |
Black or African American (United States of America) | 13.00 | |
White (United States of America) | 55.00 | |
16-20 | Asian (United States of America) | 5.00 |
Black or African American (United States of America) | 19.00 | |
White (United States of America) | 51.00 | |
21-25 | Black or African American (United States of America) | 17.00 |
White (United States of America) | 56.00 | |
25+ | Asian (United States of America) | 5.00 |
Black or African American (United States of America) | 14.00 | |
White (United States of America) | 40.00 |
current_employee_yos_race_salary = df[df['pay_rate_type'] == 'Salaried'].groupby(['years_of_service_grouped','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_employee_yos_race_salary)
count_nonzero | median | ||
---|---|---|---|
years_of_service_grouped | race_ethnicity | ||
0 | Asian (United States of America) | 8.00 | 87500.00 |
Black or African American (United States of America) | 14.00 | 91000.00 | |
Hispanic or Latino (United States of America) | 10.00 | 96250.00 | |
Two or More Races (United States of America) | 5.00 | 90000.00 | |
White (United States of America) | 28.00 | 85000.00 | |
1-2 | Asian (United States of America) | 24.00 | 86662.55 |
Black or African American (United States of America) | 13.00 | 92780.00 | |
Hispanic or Latino (United States of America) | 12.00 | 80890.00 | |
Prefer Not to Disclose (United States of America) | 7.00 | 94780.00 | |
Two or More Races (United States of America) | 8.00 | 81890.00 | |
White (United States of America) | 111.00 | 92000.00 | |
3-5 | Asian (United States of America) | 15.00 | 95000.00 |
Black or African American (United States of America) | 12.00 | 132000.00 | |
Hispanic or Latino (United States of America) | 13.00 | 93060.00 | |
Prefer Not to Disclose (United States of America) | 5.00 | 95000.00 | |
Two or More Races (United States of America) | 7.00 | 91400.00 | |
White (United States of America) | 123.00 | 104000.00 | |
6-10 | Asian (United States of America) | 11.00 | 110148.50 |
Black or African American (United States of America) | 11.00 | 96170.00 | |
Hispanic or Latino (United States of America) | 9.00 | 94780.00 | |
White (United States of America) | 94.00 | 117533.34 | |
11-15 | White (United States of America) | 46.00 | 117080.00 |
16-20 | Black or African American (United States of America) | 11.00 | 104280.09 |
White (United States of America) | 41.00 | 121316.10 | |
21-25 | Black or African American (United States of America) | 7.00 | 91280.34 |
White (United States of America) | 49.00 | 138119.40 | |
25+ | Black or African American (United States of America) | 5.00 | 84169.27 |
White (United States of America) | 33.00 | 130189.42 |
current_employee_yos_race_hourly = df[df['pay_rate_type'] == 'Hourly'].groupby(['years_of_service_grouped','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_employee_yos_race_hourly)
count_nonzero | median | ||
---|---|---|---|
years_of_service_grouped | race_ethnicity | ||
0 | Asian (United States of America) | 5.00 | 37.44 |
Black or African American (United States of America) | 8.00 | 29.69 | |
White (United States of America) | 11.00 | 35.90 | |
1-2 | Asian (United States of America) | 7.00 | 28.61 |
Black or African American (United States of America) | 17.00 | 29.01 | |
White (United States of America) | 22.00 | 32.53 | |
3-5 | Black or African American (United States of America) | 12.00 | 27.18 |
White (United States of America) | 19.00 | 33.77 | |
6-10 | Black or African American (United States of America) | 9.00 | 26.75 |
White (United States of America) | 14.00 | 32.28 | |
11-15 | Black or African American (United States of America) | 10.00 | 30.05 |
White (United States of America) | 9.00 | 32.12 | |
16-20 | Black or African American (United States of America) | 8.00 | 25.40 |
White (United States of America) | 10.00 | 36.42 | |
21-25 | Black or African American (United States of America) | 10.00 | 30.91 |
White (United States of America) | 7.00 | 35.69 | |
25+ | Black or African American (United States of America) | 9.00 | 26.79 |
White (United States of America) | 7.00 | 37.90 |
fifteen = pd.concat([fifteen1,fifteen2])
fifteenrating_gender = fifteen.groupby(['gender'])['performance_rating'].median().sort_values(ascending=False)
fifteenrating_gender
gender Female 3.40 Male 3.40 Prefer not to disclose NaN Name: performance_rating, dtype: float64
sixteen = pd.concat([sixteen1,sixteen2])
sixteenrating_gender = sixteen.groupby(['gender'])['performance_rating'].median().sort_values(ascending=False)
sixteenrating_gender
gender Female 3.30 Male 3.30 Prefer not to disclose NaN Name: performance_rating, dtype: float64
seventeen = pd.concat([seventeen1,seventeen2])
seventeenrating_gender = seventeen.groupby(['gender'])['performance_rating'].median().sort_values(ascending=False)
seventeenrating_gender
gender Female 3.40 Male 3.40 Prefer not to disclose NaN Name: performance_rating, dtype: float64
eighteen = pd.concat([eighteen1,eighteen2])
eighteenrating_gender = eighteen.groupby(['gender'])['performance_rating'].median().sort_values(ascending=False)
eighteenrating_gender
gender Female 3.40 Male 3.40 Prefer not to disclose 3.20 Name: performance_rating, dtype: float64
fifteenrating_race_ethnicity = fifteen.groupby(['race_ethnicity'])['performance_rating'].median().sort_values(ascending=False)
fifteenrating_race_ethnicity
race_ethnicity American Indian or Alaska Native (United States of America) 3.50 Asian (United States of America) 3.40 White (United States of America) 3.40 Prefer Not to Disclose (United States of America) 3.30 Two or More Races (United States of America) 3.30 Native Hawaiian or Other Pacific Islander (United States of America) 3.25 Hispanic or Latino (United States of America) 3.20 Black or African American (United States of America) 3.10 Name: performance_rating, dtype: float64
sixteenrating_race_ethnicity = sixteen.groupby(['race_ethnicity'])['performance_rating'].median().sort_values(ascending=False)
sixteenrating_race_ethnicity
race_ethnicity Native Hawaiian or Other Pacific Islander (United States of America) 3.70 Asian (United States of America) 3.40 White (United States of America) 3.40 Prefer Not to Disclose (United States of America) 3.30 American Indian or Alaska Native (United States of America) 3.25 Black or African American (United States of America) 3.20 Hispanic or Latino (United States of America) 3.20 Two or More Races (United States of America) 3.20 Name: performance_rating, dtype: float64
seventeenrating_race_ethnicity = seventeen.groupby(['race_ethnicity'])['performance_rating'].median().sort_values(ascending=False)
seventeenrating_race_ethnicity
race_ethnicity American Indian or Alaska Native (United States of America) 3.55 Native Hawaiian or Other Pacific Islander (United States of America) 3.50 Asian (United States of America) 3.40 Prefer Not to Disclose (United States of America) 3.40 White (United States of America) 3.40 Hispanic or Latino (United States of America) 3.30 Two or More Races (United States of America) 3.30 Black or African American (United States of America) 3.20 Name: performance_rating, dtype: float64
eighteenrating_race_ethnicity = eighteen.groupby(['race_ethnicity'])['performance_rating'].median().sort_values(ascending=False)
eighteenrating_race_ethnicity
race_ethnicity American Indian or Alaska Native (United States of America) 3.55 White (United States of America) 3.50 Asian (United States of America) 3.40 Native Hawaiian or Other Pacific Islander (United States of America) 3.40 Black or African American (United States of America) 3.30 Hispanic or Latino (United States of America) 3.30 Prefer Not to Disclose (United States of America) 3.30 Two or More Races (United States of America) 3.30 Name: performance_rating, dtype: float64
fifteenrating_gender_race = fifteen.groupby(['race_ethnicity','gender'])['performance_rating'].median().sort_values(ascending=False)
fifteenrating_gender_race
race_ethnicity gender American Indian or Alaska Native (United States of America) Female 3.50 Asian (United States of America) Male 3.50 White (United States of America) Male 3.50 American Indian or Alaska Native (United States of America) Male 3.40 Asian (United States of America) Female 3.40 White (United States of America) Female 3.40 Native Hawaiian or Other Pacific Islander (United States of America) Male 3.30 Prefer Not to Disclose (United States of America) Female 3.30 Two or More Races (United States of America) Female 3.30 Hispanic or Latino (United States of America) Female 3.25 Black or African American (United States of America) Female 3.20 Hispanic or Latino (United States of America) Male 3.20 Native Hawaiian or Other Pacific Islander (United States of America) Female 3.20 Black or African American (United States of America) Male 3.00 Two or More Races (United States of America) Male 2.75 Prefer Not to Disclose (United States of America) Male NaN White (United States of America) Prefer not to disclose NaN Name: performance_rating, dtype: float64
sixteenrating_gender_race = sixteen.groupby(['race_ethnicity','gender'])['performance_rating'].median().sort_values(ascending=False)
sixteenrating_gender_race
race_ethnicity gender Native Hawaiian or Other Pacific Islander (United States of America) Female 4.10 Asian (United States of America) Female 3.40 White (United States of America) Female 3.40 Male 3.40 American Indian or Alaska Native (United States of America) Female 3.30 Asian (United States of America) Male 3.30 Native Hawaiian or Other Pacific Islander (United States of America) Male 3.30 Prefer Not to Disclose (United States of America) Female 3.30 Black or African American (United States of America) Female 3.25 American Indian or Alaska Native (United States of America) Male 3.20 Hispanic or Latino (United States of America) Female 3.20 Two or More Races (United States of America) Female 3.20 Hispanic or Latino (United States of America) Male 3.15 Black or African American (United States of America) Male 3.10 Two or More Races (United States of America) Male 2.70 Prefer Not to Disclose (United States of America) Male NaN White (United States of America) Prefer not to disclose NaN Name: performance_rating, dtype: float64
seventeenrating_gender_race = seventeen.groupby(['race_ethnicity','gender'])['performance_rating'].median().sort_values(ascending=False)
seventeenrating_gender_race
race_ethnicity gender Native Hawaiian or Other Pacific Islander (United States of America) Female 4.00 American Indian or Alaska Native (United States of America) Female 3.70 Prefer Not to Disclose (United States of America) Female 3.50 Asian (United States of America) Female 3.40 White (United States of America) Female 3.40 Male 3.40 Asian (United States of America) Male 3.35 Hispanic or Latino (United States of America) Female 3.30 Male 3.30 Two or More Races (United States of America) Female 3.30 Male 3.30 Black or African American (United States of America) Female 3.20 Prefer Not to Disclose (United States of America) Male 3.20 American Indian or Alaska Native (United States of America) Male 3.10 Black or African American (United States of America) Male 3.10 Native Hawaiian or Other Pacific Islander (United States of America) Male 3.00 White (United States of America) Prefer not to disclose NaN Name: performance_rating, dtype: float64
eighteenrating_gender_race = eighteen.groupby(['race_ethnicity','gender'])['performance_rating'].median().sort_values(ascending=False)
eighteenrating_gender_race
race_ethnicity gender American Indian or Alaska Native (United States of America) Female 3.70 Prefer Not to Disclose (United States of America) Female 3.55 White (United States of America) Male 3.50 Asian (United States of America) Female 3.40 Male 3.40 Native Hawaiian or Other Pacific Islander (United States of America) Male 3.40 White (United States of America) Female 3.40 Two or More Races (United States of America) Female 3.35 Black or African American (United States of America) Male 3.30 Hispanic or Latino (United States of America) Female 3.30 Male 3.30 Prefer Not to Disclose (United States of America) Male 3.30 American Indian or Alaska Native (United States of America) Male 3.20 Black or African American (United States of America) Female 3.20 Two or More Races (United States of America) Male 3.20 White (United States of America) Prefer not to disclose 3.20 Native Hawaiian or Other Pacific Islander (United States of America) Female NaN Name: performance_rating, dtype: float64
reason_for_change = reason_for_change_combined.groupby(['business_process_reason']).agg({'business_process_reason': [np.count_nonzero]})
suppress_count(reason_for_change)
count_nonzero | |
---|---|
business_process_reason | |
Request Compensation Change > Adjustment > Contract Increase | 4149 |
Merit > Performance > Annual Performance Appraisal | 2723 |
Request Compensation Change > Adjustment > Change Plan Assignment | 1399 |
Data Change > Data Change > Change Job Details | 800 |
Request Compensation Change > Adjustment > Market Adjustment | 639 |
Transfer > Transfer > Move to another manager | 527 |
Promotion > Promotion > Promotion | 489 |
Hire Employee > New Hire > Fill Vacancy | 345 |
Hire Employee > New Hire > New Position | 244 |
Request Compensation Change > Adjustment > Increased Job Responsibilities | 76 |
Request Compensation Change > Adjustment > Job Change | 66 |
Lateral Move > Lateral Move > Move to Another Position | 53 |
Transfer > Transfer > Transfer between departments | 48 |
Hire Employee > New Hire > Convert Contingent | 41 |
Request Compensation Change > Adjustment > Performance | 39 |
Data Change > Data Change > Change Job Profile | 31 |
Hire Employee > Rehire > Fill Vacancy | 30 |
Transfer > Transfer > Transfer between companies | 26 |
Hire Employee > New Hire > Conversion | 11 |
Hire Employee > Rehire > New Position | 8 |
Data Change > Data Change > Change Location - From International | 5 |
Lateral Move > Lateral Move > Change Job Profile | 5 |
reason_for_change_gender = reason_for_change_combined.groupby(['business_process_reason','gender']).agg({'business_process_reason': [np.count_nonzero]})
suppress_count(reason_for_change_gender)
count_nonzero | ||
---|---|---|
business_process_reason | gender | |
Request Compensation Change > Adjustment > Contract Increase | Female | 2169 |
Male | 1975 | |
Merit > Performance > Annual Performance Appraisal | Female | 1399 |
Male | 1322 | |
Request Compensation Change > Adjustment > Change Plan Assignment | Female | 809 |
Male | 588 | |
Data Change > Data Change > Change Job Details | Female | 433 |
Request Compensation Change > Adjustment > Market Adjustment | Female | 389 |
Data Change > Data Change > Change Job Details | Male | 367 |
Promotion > Promotion > Promotion | Female | 321 |
Transfer > Transfer > Move to another manager | Male | 293 |
Request Compensation Change > Adjustment > Market Adjustment | Male | 250 |
Transfer > Transfer > Move to another manager | Female | 234 |
Hire Employee > New Hire > Fill Vacancy | Female | 208 |
Promotion > Promotion > Promotion | Male | 168 |
Hire Employee > New Hire > New Position | Female | 142 |
Hire Employee > New Hire > Fill Vacancy | Male | 135 |
Hire Employee > New Hire > New Position | Male | 102 |
Request Compensation Change > Adjustment > Increased Job Responsibilities | Male | 43 |
Request Compensation Change > Adjustment > Job Change | Female | 37 |
Lateral Move > Lateral Move > Move to Another Position | Male | 34 |
Request Compensation Change > Adjustment > Increased Job Responsibilities | Female | 33 |
Request Compensation Change > Adjustment > Job Change | Male | 29 |
Hire Employee > New Hire > Convert Contingent | Female | 28 |
Transfer > Transfer > Transfer between departments | Female | 24 |
Male | 24 | |
Request Compensation Change > Adjustment > Performance | Male | 21 |
Transfer > Transfer > Transfer between companies | Female | 21 |
Data Change > Data Change > Change Job Profile | Female | 19 |
Lateral Move > Lateral Move > Move to Another Position | Female | 19 |
Request Compensation Change > Adjustment > Performance | Female | 18 |
Hire Employee > Rehire > Fill Vacancy | Male | 16 |
Female | 14 | |
Hire Employee > New Hire > Convert Contingent | Male | 13 |
Data Change > Data Change > Change Job Profile | Male | 12 |
Hire Employee > New Hire > Conversion | Female | 6 |
Hire Employee > Rehire > New Position | Female | 6 |
Request Compensation Change > Adjustment > Contract Increase | Prefer not to disclose | 5 |
Hire Employee > New Hire > Conversion | Male | 5 |
Transfer > Transfer > Transfer between companies | Male | 5 |
reason_for_change_race = reason_for_change_combined.groupby(['business_process_reason','race_ethnicity']).agg({'business_process_reason': [np.count_nonzero]})
suppress_count(reason_for_change_race)
count_nonzero | ||
---|---|---|
business_process_reason | race_ethnicity | |
Request Compensation Change > Adjustment > Contract Increase | White (United States of America) | 2656 |
Merit > Performance > Annual Performance Appraisal | White (United States of America) | 1775 |
Request Compensation Change > Adjustment > Change Plan Assignment | White (United States of America) | 882 |
Request Compensation Change > Adjustment > Contract Increase | Black or African American (United States of America) | 761 |
Data Change > Data Change > Change Job Details | White (United States of America) | 538 |
... | ... | ... |
Hire Employee > New Hire > Convert Contingent | Asian (United States of America) | 5 |
Lateral Move > Lateral Move > Change Job Profile | Black or African American (United States of America) | 5 |
Hire Employee > Rehire > Fill Vacancy | Asian (United States of America) | 5 |
Data Change > Data Change > Change Job Details | Prefer Not to Disclose (United States of America) | 5 |
Merit > Performance > Annual Performance Appraisal | Native Hawaiian or Other Pacific Islander (United States of America) | 5 |
80 rows × 1 columns
reason_for_change_race_gender = reason_for_change_combined.groupby(['business_process_reason','race_ethnicity','gender']).agg({'business_process_reason': [np.count_nonzero]})
suppress_count(reason_for_change_race_gender)
count_nonzero | |||
---|---|---|---|
business_process_reason | race_ethnicity | gender | |
Request Compensation Change > Adjustment > Contract Increase | White (United States of America) | Female | 1356 |
Male | 1295 | ||
Merit > Performance > Annual Performance Appraisal | White (United States of America) | Male | 899 |
Female | 874 | ||
Request Compensation Change > Adjustment > Change Plan Assignment | White (United States of America) | Female | 497 |
... | ... | ... | ... |
Hire Employee > New Hire > Convert Contingent | Black or African American (United States of America) | Female | 5 |
Request Compensation Change > Adjustment > Contract Increase | White (United States of America) | Prefer not to disclose | 5 |
Native Hawaiian or Other Pacific Islander (United States of America) | Male | 5 | |
Transfer > Transfer > Transfer between companies | White (United States of America) | Male | 5 |
Hire Employee > New Hire > Fill Vacancy | Prefer Not to Disclose (United States of America) | Male | 5 |
121 rows × 1 columns
current_news_gender_salaried = news_salaried.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_news_gender_salaried)
count_nonzero | |
---|---|
gender | |
Female | 336 |
Male | 320 |
current_news_gender_hourly = news_hourly.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_news_gender_hourly)
count_nonzero | |
---|---|
gender | |
Female | 50 |
Male | 33 |
current_news_gender_salaried_median = news_salaried.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_gender_salaried_median)
count_nonzero | median | |
---|---|---|
gender | ||
Female | 336 | 102700.04 |
Male | 320 | 120976.60 |
current_news_gender_hourly_median = news_hourly.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_gender_hourly_median)
count_nonzero | median | |
---|---|---|
gender | ||
Female | 50 | 35.90 |
Male | 33 | 33.77 |
current_news_gender_age_salaried = news_salaried.groupby(['gender'])['age'].median().sort_values(ascending=False)
current_news_gender_age_salaried
gender Male 42.00 Female 35.50 Prefer not to disclose 30.00 Name: age, dtype: float64
current_news_gender_age_hourly = news_hourly.groupby(['gender'])['age'].median().sort_values(ascending=False)
current_news_gender_age_hourly
gender Female 34.00 Male 34.00 Name: age, dtype: float64
current_news_gender_age_5_salary = news_salaried.groupby(['age_group_5','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_gender_age_5_salary)
count_nonzero | median | ||
---|---|---|---|
age_group_5 | gender | ||
<25 | Female | 13.00 | 69280.00 |
25-29 | Female | 79.00 | 83780.00 |
Male | 32.00 | 80830.00 | |
30-34 | Female | 61.00 | 93000.00 |
Male | 44.00 | 99170.00 | |
35-39 | Female | 58.00 | 105790.00 |
Male | 60.00 | 127373.27 | |
40-44 | Female | 28.00 | 121560.00 |
Male | 43.00 | 131410.00 | |
45-49 | Female | 23.00 | 133000.00 |
Male | 28.00 | 124568.47 | |
50-54 | Female | 27.00 | 113280.00 |
Male | 43.00 | 134140.00 | |
55-59 | Female | 21.00 | 120363.00 |
Male | 26.00 | 140697.96 | |
60-64 | Female | 21.00 | 155522.35 |
Male | 28.00 | 152863.59 | |
65+ | Female | 5.00 | 162355.42 |
Male | 13.00 | 128180.00 |
current_news_gender_age_5_hourly = news_hourly.groupby(['age_group_5','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_gender_age_5_hourly)
count_nonzero | median | ||
---|---|---|---|
age_group_5 | gender | ||
<25 | Female | 11 | 32.65 |
25-29 | Female | 8 | 33.85 |
Male | 7 | 17.30 | |
30-34 | Female | 7 | 33.33 |
Male | 7 | 37.18 | |
50-54 | Female | 5 | 44.67 |
65+ | Female | 5 | 44.46 |
current_news_gender_age_10_salary = news_salaried.groupby(['age_group_10','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_gender_age_10_salary)
count_nonzero | median | ||
---|---|---|---|
age_group_10 | gender | ||
<25 | Female | 13.00 | 69280.00 |
25-34 | Female | 140.00 | 90000.00 |
Male | 76.00 | 90140.00 | |
35-44 | Female | 86.00 | 110016.26 |
Male | 103.00 | 130000.00 | |
45-54 | Female | 50.00 | 125350.00 |
Male | 71.00 | 127840.00 | |
55-64 | Female | 42.00 | 140669.27 |
Male | 54.00 | 144111.84 | |
65+ | Female | 5.00 | 162355.42 |
Male | 13.00 | 128180.00 |
current_news_gender_age_10_hourly = news_hourly.groupby(['age_group_10','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_gender_age_10_hourly)
count_nonzero | median | ||
---|---|---|---|
age_group_10 | gender | ||
<25 | Female | 11 | 32.65 |
25-34 | Female | 15 | 33.85 |
Male | 14 | 23.56 | |
35-44 | Female | 5 | 36.05 |
Male | 5 | 32.77 | |
45-54 | Female | 7 | 51.30 |
55-64 | Female | 7 | 42.94 |
Male | 7 | 34.77 | |
65+ | Female | 5 | 44.46 |
current_news_gender_salaried_under_40 = news_salaried[news_salaried['age'] < 40].groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_gender_salaried_under_40)
count_nonzero | median | |
---|---|---|
gender | ||
Female | 211 | 90440.00 |
Male | 139 | 103400.00 |
current_news_gender_salaried_over_40 = news_salaried[news_salaried['age'] > 39].groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_gender_salaried_over_40)
count_nonzero | median | |
---|---|---|
gender | ||
Female | 125 | 128280.00 |
Male | 181 | 131410.00 |
current_news_gender_hourly_under_40 = news_hourly[news_hourly['age'] < 40].groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_gender_hourly_under_40)
count_nonzero | median | |
---|---|---|
gender | ||
Female | 28 | 33.85 |
Male | 20 | 31.12 |
current_news_gender_hourly_over_40 = news_hourly[news_hourly['age'] > 39].groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_gender_hourly_over_40)
count_nonzero | median | |
---|---|---|
gender | ||
Female | 22 | 44.56 |
Male | 13 | 34.18 |
current_news_race_salaried = news_salaried.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_news_race_salaried)
count_nonzero | |
---|---|
race_ethnicity | |
White (United States of America) | 437 |
Black or African American (United States of America) | 60 |
Asian (United States of America) | 59 |
Hispanic or Latino (United States of America) | 42 |
Two or More Races (United States of America) | 18 |
Prefer Not to Disclose (United States of America) | 12 |
current_news_race_hourly = news_hourly.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_news_race_hourly)
count_nonzero | |
---|---|
race_ethnicity | |
White (United States of America) | 58 |
Asian (United States of America) | 11 |
Black or African American (United States of America) | 8 |
current_news_race_group_salaried = news_salaried.groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_news_race_group_salaried)
count_nonzero | |
---|---|
race_grouping | |
white | 437 |
person of color | 181 |
unknown | 39 |
current_news_race_group_hourly = news_hourly.groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_news_race_group_hourly)
count_nonzero | |
---|---|
race_grouping | |
white | 58 |
person of color | 24 |
current_news_race_median_salaried = news_salaried.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_race_median_salaried)
count_nonzero | median | |
---|---|---|
race_ethnicity | ||
White (United States of America) | 437 | 113809.68 |
Asian (United States of America) | 59 | 103970.05 |
Black or African American (United States of America) | 60 | 102700.04 |
Hispanic or Latino (United States of America) | 42 | 95780.04 |
Prefer Not to Disclose (United States of America) | 12 | 94890.00 |
Two or More Races (United States of America) | 18 | 91090.00 |
current_news_race_median_hourly = news_hourly.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_race_median_hourly)
count_nonzero | median | |
---|---|---|
race_ethnicity | ||
White (United States of America) | 58 | 35.40 |
Asian (United States of America) | 11 | 34.28 |
Black or African American (United States of America) | 8 | 23.22 |
current_news_race_group_median_salaried = news_salaried.groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_race_group_median_salaried)
count_nonzero | median | |
---|---|---|
race_grouping | ||
unknown | 39 | 130000.00 |
white | 437 | 113809.68 |
person of color | 181 | 98435.00 |
current_news_race_group_median_hourly = news_hourly.groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_race_group_median_hourly)
count_nonzero | median | |
---|---|---|
race_grouping | ||
white | 58 | 35.40 |
person of color | 24 | 33.59 |
current_news_race_age_salaried = news_salaried.groupby(['race_ethnicity'])['age'].median().sort_values(ascending=False)
current_news_race_age_salaried
race_ethnicity American Indian or Alaska Native (United States of America) 46.00 Native Hawaiian or Other Pacific Islander (United States of America) 45.00 White (United States of America) 41.00 Black or African American (United States of America) 39.00 Hispanic or Latino (United States of America) 35.50 Asian (United States of America) 35.00 Prefer Not to Disclose (United States of America) 31.00 Two or More Races (United States of America) 30.00 Name: age, dtype: float64
current_news_race_age_hourly = news_hourly.groupby(['race_ethnicity'])['age'].median().sort_values(ascending=False)
current_news_race_age_hourly
race_ethnicity American Indian or Alaska Native (United States of America) 71.00 White (United States of America) 36.50 Black or African American (United States of America) 30.50 Asian (United States of America) 28.00 Hispanic or Latino (United States of America) 25.00 Two or More Races (United States of America) 22.00 Name: age, dtype: float64
current_news_race_age_5_salary = news_salaried.groupby(['age_group_5','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_age_5_salary)
count_nonzero | median | ||
---|---|---|---|
age_group_5 | race_ethnicity | ||
<25 | White (United States of America) | 9.00 | 68000.00 |
25-29 | Asian (United States of America) | 17.00 | 90000.00 |
Black or African American (United States of America) | 10.00 | 82000.00 | |
Hispanic or Latino (United States of America) | 12.00 | 86690.00 | |
Two or More Races (United States of America) | 7.00 | 79000.00 | |
White (United States of America) | 60.00 | 80670.00 | |
30-34 | Asian (United States of America) | 9.00 | 120000.00 |
Black or African American (United States of America) | 11.00 | 92000.00 | |
Hispanic or Latino (United States of America) | 8.00 | 93894.94 | |
Prefer Not to Disclose (United States of America) | 5.00 | 95000.00 | |
White (United States of America) | 62.00 | 95104.86 | |
35-39 | Asian (United States of America) | 12.00 | 108170.00 |
Black or African American (United States of America) | 7.00 | 120000.00 | |
Hispanic or Latino (United States of America) | 8.00 | 100280.00 | |
White (United States of America) | 78.00 | 116420.00 | |
40-44 | Asian (United States of America) | 8.00 | 117246.75 |
Black or African American (United States of America) | 10.00 | 133640.00 | |
Hispanic or Latino (United States of America) | 7.00 | 96780.08 | |
White (United States of America) | 42.00 | 129390.00 | |
45-49 | White (United States of America) | 39.00 | 129828.93 |
50-54 | Asian (United States of America) | 5.00 | 118821.01 |
Black or African American (United States of America) | 9.00 | 110560.00 | |
White (United States of America) | 50.00 | 129625.00 | |
55-59 | Black or African American (United States of America) | 6.00 | 133849.19 |
White (United States of America) | 37.00 | 130000.00 | |
60-64 | White (United States of America) | 42.00 | 144428.01 |
65+ | White (United States of America) | 18.00 | 140134.90 |
current_news_race_age_5_hourly = news_hourly.groupby(['age_group_5','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_age_5_hourly)
count_nonzero | median | ||
---|---|---|---|
age_group_5 | race_ethnicity | ||
<25 | White (United States of America) | 6.00 | 32.70 |
25-29 | White (United States of America) | 10.00 | 20.88 |
30-34 | White (United States of America) | 10.00 | 34.36 |
35-39 | White (United States of America) | 5.00 | 36.05 |
50-54 | White (United States of America) | 6.00 | 39.42 |
55-59 | White (United States of America) | 5.00 | 39.04 |
60-64 | White (United States of America) | 7.00 | 35.69 |
current_news_race_age_10_salary = news_salaried.groupby(['age_group_10','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_age_10_salary)
count_nonzero | median | ||
---|---|---|---|
age_group_10 | race_ethnicity | ||
<25 | White (United States of America) | 9.00 | 68000.00 |
25-34 | Asian (United States of America) | 26.00 | 90530.00 |
Black or African American (United States of America) | 21.00 | 90000.00 | |
Hispanic or Latino (United States of America) | 20.00 | 91254.94 | |
Prefer Not to Disclose (United States of America) | 9.00 | 90000.00 | |
Two or More Races (United States of America) | 10.00 | 89500.00 | |
White (United States of America) | 122.00 | 88870.00 | |
35-44 | Asian (United States of America) | 20.00 | 110016.26 |
Black or African American (United States of America) | 17.00 | 131090.00 | |
Hispanic or Latino (United States of America) | 15.00 | 96780.08 | |
Two or More Races (United States of America) | 5.00 | 128280.00 | |
White (United States of America) | 120.00 | 119280.00 | |
45-54 | Asian (United States of America) | 7.00 | 118821.01 |
Black or African American (United States of America) | 12.00 | 112524.14 | |
Hispanic or Latino (United States of America) | 6.00 | 134232.40 | |
White (United States of America) | 89.00 | 129690.00 | |
55-64 | Black or African American (United States of America) | 8.00 | 138188.95 |
White (United States of America) | 79.00 | 138119.40 | |
65+ | White (United States of America) | 18.00 | 140134.90 |
current_news_race_age_10_hourly = news_hourly.groupby(['age_group_10','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_age_10_hourly)
count_nonzero | median | ||
---|---|---|---|
age_group_10 | race_ethnicity | ||
<25 | White (United States of America) | 6.00 | 32.70 |
25-34 | White (United States of America) | 20.00 | 32.48 |
35-44 | White (United States of America) | 7.00 | 35.40 |
45-54 | White (United States of America) | 9.00 | 44.67 |
55-64 | White (United States of America) | 12.00 | 37.36 |
current_news_race_group_age_5_salary = news_salaried.groupby(['age_group_5','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_group_age_5_salary)
count_nonzero | median | ||
---|---|---|---|
age_group_5 | race_grouping | ||
<25 | person of color | 7.00 | 80060.00 |
white | 9.00 | 68000.00 | |
25-29 | person of color | 46.00 | 85641.00 |
unknown | 5.00 | 90000.00 | |
white | 60.00 | 80670.00 | |
30-34 | person of color | 31.00 | 93000.00 |
unknown | 13.00 | 98340.00 | |
white | 62.00 | 95104.86 | |
35-39 | person of color | 29.00 | 108000.00 |
unknown | 11.00 | 133560.00 | |
white | 78.00 | 116420.00 | |
40-44 | person of color | 28.00 | 125812.50 |
white | 42.00 | 129390.00 | |
45-49 | person of color | 9.00 | 103970.05 |
white | 39.00 | 129828.93 | |
50-54 | person of color | 19.00 | 117924.49 |
white | 50.00 | 129625.00 | |
55-59 | person of color | 8.00 | 104560.78 |
white | 37.00 | 130000.00 | |
60-64 | white | 42.00 | 144428.01 |
65+ | white | 18.00 | 140134.90 |
current_news_race_group_age_5_hourly = news_hourly.groupby(['age_group_5','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_group_age_5_hourly)
count_nonzero | median | ||
---|---|---|---|
age_group_5 | race_grouping | ||
<25 | person of color | 8.00 | 33.70 |
white | 6.00 | 32.70 | |
25-29 | person of color | 5.00 | 25.41 |
white | 10.00 | 20.88 | |
30-34 | white | 10.00 | 34.36 |
35-39 | white | 5.00 | 36.05 |
50-54 | white | 6.00 | 39.42 |
55-59 | white | 5.00 | 39.04 |
60-64 | white | 7.00 | 35.69 |
current_news_race_group_age_10_salary = news_salaried.groupby(['age_group_10','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_group_age_10_salary)
count_nonzero | median | ||
---|---|---|---|
age_group_10 | race_grouping | ||
<25 | person of color | 7.00 | 80060.00 |
white | 9.00 | 68000.00 | |
25-34 | person of color | 77.00 | 90000.00 |
unknown | 18.00 | 92500.00 | |
white | 122.00 | 88870.00 | |
35-44 | person of color | 57.00 | 110560.00 |
unknown | 12.00 | 135060.00 | |
white | 120.00 | 119280.00 | |
45-54 | person of color | 28.00 | 116206.39 |
white | 89.00 | 129690.00 | |
55-64 | person of color | 12.00 | 138188.95 |
unknown | 5.00 | 165742.33 | |
white | 79.00 | 138119.40 | |
65+ | white | 18.00 | 140134.90 |
current_news_race_group_age_10_hourly = news_hourly.groupby(['age_group_10','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_group_age_10_hourly)
count_nonzero | median | ||
---|---|---|---|
age_group_10 | race_grouping | ||
<25 | person of color | 8.00 | 33.70 |
white | 6.00 | 32.70 | |
25-34 | person of color | 8.00 | 29.37 |
white | 20.00 | 32.48 | |
35-44 | white | 7.00 | 35.40 |
45-54 | white | 9.00 | 44.67 |
55-64 | white | 12.00 | 37.36 |
current_news_race_under_40_salaried = news_salaried[news_salaried['age'] < 40].groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_race_under_40_salaried)
count_nonzero | median | |
---|---|---|
race_ethnicity | ||
White (United States of America) | 209 | 96000.00 |
Prefer Not to Disclose (United States of America) | 11 | 94780.00 |
Asian (United States of America) | 41 | 94280.00 |
Hispanic or Latino (United States of America) | 28 | 92780.00 |
Black or African American (United States of America) | 30 | 92250.00 |
Two or More Races (United States of America) | 14 | 89500.00 |
current_news_race_over_40_salaried = news_salaried[news_salaried['age'] > 39].groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_race_over_40_salaried)
count_nonzero | median | |
---|---|---|
race_grouping | ||
unknown | 10 | 151785.44 |
white | 228 | 130799.71 |
person of color | 68 | 121874.57 |
current_news_race_under_40_hourly = news_hourly[news_hourly['age'] < 40].groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_race_under_40_hourly)
count_nonzero | median | |
---|---|---|
race_ethnicity | ||
White (United States of America) | 31 | 33.33 |
Asian (United States of America) | 7 | 32.40 |
Black or African American (United States of America) | 5 | 21.71 |
current_news_race_over_40_hourly = news_hourly[news_hourly['age'] > 39].groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_race_over_40_hourly)
count_nonzero | median | |
---|---|---|
race_ethnicity | ||
White (United States of America) | 27 | 39.62 |
current_news_race_gender_salaried = news_salaried.groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_news_race_gender_salaried)
count_nonzero | ||
---|---|---|
race_ethnicity | gender | |
Asian (United States of America) | Female | 43 |
Male | 16 | |
Black or African American (United States of America) | Female | 31 |
Male | 29 | |
Hispanic or Latino (United States of America) | Female | 21 |
Male | 21 | |
Prefer Not to Disclose (United States of America) | Female | 5 |
Male | 7 | |
Two or More Races (United States of America) | Female | 13 |
Male | 5 | |
White (United States of America) | Female | 207 |
Male | 229 |
current_news_race_gender_hourly = news_hourly.groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_news_race_gender_hourly)
count_nonzero | ||
---|---|---|
race_ethnicity | gender | |
Asian (United States of America) | Female | 8 |
White (United States of America) | Female | 33 |
Male | 25 |
current_news_race_gender_median_salaried = news_salaried.groupby(['race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_gender_median_salaried)
count_nonzero | median | ||
---|---|---|---|
race_grouping | gender | ||
person of color | Female | 109 | 94840.00 |
Male | 72 | 104420.04 | |
unknown | Female | 20 | 116890.00 |
Male | 19 | 133560.00 | |
white | Female | 207 | 105780.00 |
Male | 229 | 123796.94 |
current_news_race_gender_median_hourly = news_hourly.groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_gender_median_hourly)
count_nonzero | median | ||
---|---|---|---|
race_ethnicity | gender | ||
Asian (United States of America) | Female | 8 | 33.34 |
White (United States of America) | Female | 33 | 38.46 |
Male | 25 | 33.77 |
current_news_race_gender_under_40_salaried = news_salaried[news_salaried['age'] < 40].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_gender_under_40_salaried)
count_nonzero | median | ||
---|---|---|---|
race_ethnicity | gender | ||
Asian (United States of America) | Female | 31 | 94280.00 |
Male | 10 | 100200.00 | |
Black or African American (United States of America) | Female | 18 | 87281.00 |
Male | 12 | 110000.00 | |
Hispanic or Latino (United States of America) | Female | 20 | 92780.00 |
Male | 8 | 91260.00 | |
Prefer Not to Disclose (United States of America) | Female | 5 | 90000.00 |
Male | 6 | 96560.00 | |
Two or More Races (United States of America) | Female | 10 | 90390.00 |
White (United States of America) | Female | 116 | 90030.00 |
Male | 92 | 103700.00 |
current_news_race_gender_under_40_hourly = news_hourly[news_hourly['age'] < 40].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_gender_under_40_hourly)
count_nonzero | median | ||
---|---|---|---|
race_ethnicity | gender | ||
Asian (United States of America) | Female | 5 | 32.40 |
White (United States of America) | Female | 16 | 33.59 |
Male | 15 | 32.75 |
current_news_race_gender_over_40_salaried = news_salaried[news_salaried['age'] > 39].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_gender_over_40_salaried)
count_nonzero | median | ||
---|---|---|---|
race_ethnicity | gender | ||
Asian (United States of America) | Female | 12 | 114484.76 |
Male | 6 | 125812.50 | |
Black or African American (United States of America) | Female | 13 | 125700.00 |
Male | 17 | 114488.29 | |
Hispanic or Latino (United States of America) | Male | 13 | 98411.56 |
White (United States of America) | Female | 91 | 127340.00 |
Male | 137 | 137799.88 |
current_news_race_gender_over_40_hourly = news_hourly[news_hourly['age'] > 39].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_race_gender_over_40_hourly)
count_nonzero | median | ||
---|---|---|---|
race_ethnicity | gender | ||
White (United States of America) | Female | 17 | 44.67 |
Male | 10 | 33.98 |
current_news_yos_salary = news_salaried.groupby(['years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_yos_salary)
count_nonzero | median | |
---|---|---|
years_of_service_grouped | ||
0 | 56 | 90000.00 |
1-2 | 139 | 92780.00 |
3-5 | 154 | 102210.00 |
6-10 | 117 | 115000.00 |
11-15 | 48 | 125342.50 |
16-20 | 51 | 120250.77 |
21-25 | 53 | 140206.92 |
25+ | 39 | 138517.37 |
current_news_yos_hourly = news_hourly.groupby(['years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_yos_hourly)
count_nonzero | median | |
---|---|---|
years_of_service_grouped | ||
0 | 13 | 35.90 |
1-2 | 21 | 21.31 |
3-5 | 13 | 33.85 |
6-10 | 10 | 38.89 |
11-15 | 6 | 37.55 |
16-20 | 8 | 38.19 |
21-25 | 7 | 35.69 |
25+ | 5 | 39.62 |
current_news_yos_gender_salary = news_salaried.groupby(['years_of_service_grouped','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_yos_gender_salary)
count_nonzero | median | ||
---|---|---|---|
years_of_service_grouped | gender | ||
0 | Female | 38.00 | 90000.00 |
Male | 18.00 | 107890.00 | |
1-2 | Female | 82.00 | 88870.00 |
Male | 56.00 | 105170.00 | |
3-5 | Female | 78.00 | 95000.00 |
Male | 76.00 | 108395.00 | |
6-10 | Female | 53.00 | 110148.50 |
Male | 64.00 | 118235.50 | |
11-15 | Female | 21.00 | 104618.47 |
Male | 27.00 | 139410.00 | |
16-20 | Female | 27.00 | 113809.68 |
Male | 24.00 | 129909.46 | |
21-25 | Female | 22.00 | 138079.17 |
Male | 31.00 | 140206.92 | |
25+ | Female | 15.00 | 137429.69 |
Male | 24.00 | 144309.47 |
current_news_yos_gender_hourly = news_hourly.groupby(['years_of_service_grouped','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_yos_gender_hourly)
count_nonzero | median | ||
---|---|---|---|
years_of_service_grouped | gender | ||
0 | Female | 12 | 35.90 |
1-2 | Female | 12 | 21.16 |
Male | 9 | 21.71 | |
3-5 | Female | 5 | 40.80 |
Male | 8 | 33.77 | |
6-10 | Male | 6 | 35.38 |
16-20 | Female | 5 | 44.46 |
21-25 | Female | 5 | 40.63 |
current_news_yos_race_salary = news_salaried.groupby(['years_of_service_grouped','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_yos_race_salary)
count_nonzero | median | ||
---|---|---|---|
years_of_service_grouped | race_ethnicity | ||
0 | Asian (United States of America) | 7.00 | 90000.00 |
Black or African American (United States of America) | 11.00 | 95000.00 | |
Hispanic or Latino (United States of America) | 8.00 | 102890.00 | |
Two or More Races (United States of America) | 5.00 | 90000.00 | |
White (United States of America) | 20.00 | 81390.00 | |
1-2 | Asian (United States of America) | 17.00 | 90000.00 |
Black or African American (United States of America) | 12.00 | 92640.00 | |
Hispanic or Latino (United States of America) | 12.00 | 80890.00 | |
Prefer Not to Disclose (United States of America) | 6.00 | 92390.00 | |
Two or More Races (United States of America) | 5.00 | 89000.00 | |
White (United States of America) | 80.00 | 94890.00 | |
3-5 | Asian (United States of America) | 13.00 | 95000.00 |
Black or African American (United States of America) | 11.00 | 132910.00 | |
Hispanic or Latino (United States of America) | 10.00 | 93920.00 | |
Prefer Not to Disclose (United States of America) | 5.00 | 95000.00 | |
Two or More Races (United States of America) | 7.00 | 91400.00 | |
White (United States of America) | 105.00 | 104690.00 | |
6-10 | Asian (United States of America) | 10.00 | 110429.25 |
Black or African American (United States of America) | 9.00 | 102692.61 | |
Hispanic or Latino (United States of America) | 9.00 | 94780.00 | |
White (United States of America) | 79.00 | 115840.00 | |
11-15 | White (United States of America) | 42.00 | 123560.00 |
16-20 | Black or African American (United States of America) | 9.00 | 114488.29 |
White (United States of America) | 36.00 | 121755.79 | |
21-25 | Black or African American (United States of America) | 6.00 | 110616.14 |
White (United States of America) | 44.00 | 141741.72 | |
25+ | White (United States of America) | 31.00 | 134036.17 |
current_news_yos_race_hourly = news_hourly.groupby(['years_of_service_grouped','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_yos_race_hourly)
count_nonzero | median | ||
---|---|---|---|
years_of_service_grouped | race_ethnicity | ||
0 | Asian (United States of America) | 5.00 | 37.44 |
1-2 | White (United States of America) | 15.00 | 25.91 |
3-5 | White (United States of America) | 10.00 | 35.48 |
6-10 | White (United States of America) | 9.00 | 37.99 |
16-20 | White (United States of America) | 6.00 | 47.92 |
21-25 | White (United States of America) | 7.00 | 35.69 |
current_news_yos_race_gender_salary = news_salaried.groupby(['years_of_service_grouped','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_yos_race_gender_salary)
count_nonzero | median | |||
---|---|---|---|---|
years_of_service_grouped | race_grouping | gender | ||
0 | person of color | Female | 20.00 | 95000.00 |
Male | 11.00 | 92000.00 | ||
white | Female | 14.00 | 76250.00 | |
Male | 6.00 | 162500.00 | ||
1-2 | person of color | Female | 31.00 | 84780.00 |
Male | 15.00 | 104560.00 | ||
unknown | Female | 9.00 | 102780.00 | |
white | Female | 42.00 | 89120.00 | |
Male | 37.00 | 107560.00 | ||
3-5 | person of color | Female | 26.00 | 93420.00 |
Male | 15.00 | 120000.00 | ||
unknown | Male | 5.00 | 126840.00 | |
white | Female | 49.00 | 98200.00 | |
Male | 56.00 | 107097.80 | ||
6-10 | person of color | Female | 15.00 | 104320.14 |
Male | 15.00 | 98411.56 | ||
unknown | Male | 5.00 | 133560.00 | |
white | Female | 35.00 | 115000.00 | |
Male | 44.00 | 121560.00 | ||
11-15 | white | Female | 18.00 | 104861.56 |
Male | 24.00 | 138800.96 | ||
16-20 | person of color | Female | 6.00 | 113904.26 |
Male | 7.00 | 114488.29 | ||
white | Female | 20.00 | 113772.10 | |
Male | 16.00 | 139851.85 | ||
21-25 | person of color | Female | 6.00 | 152535.08 |
white | Female | 16.00 | 131124.75 | |
Male | 28.00 | 149376.57 | ||
25+ | white | Female | 13.00 | 125000.00 |
Male | 18.00 | 136276.77 |
current_news_yos_race_gender_hourly = news_hourly.groupby(['years_of_service_grouped','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_yos_race_gender_hourly)
count_nonzero | median | |||
---|---|---|---|---|
years_of_service_grouped | race_grouping | gender | ||
0 | person of color | Female | 8.00 | 35.45 |
1-2 | white | Female | 9.00 | 25.91 |
Male | 6.00 | 24.57 | ||
3-5 | white | Male | 7.00 | 33.77 |
6-10 | white | Male | 5.00 | 32.77 |
21-25 | white | Female | 5.00 | 40.63 |
current_median_news_age_5_salaried = news_salaried.groupby(['age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_salaried)
count_nonzero | median | |
---|---|---|
age_group_5 | ||
<25 | 16 | 70000.00 |
25-29 | 111 | 82780.00 |
30-34 | 106 | 95000.00 |
35-39 | 118 | 115975.00 |
40-44 | 71 | 128280.00 |
45-49 | 51 | 126840.00 |
50-54 | 70 | 121579.96 |
55-59 | 47 | 130000.00 |
60-64 | 49 | 155522.35 |
65+ | 18 | 140134.90 |
current_median_news_age_5_hourly = news_hourly.groupby(['age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_hourly)
count_nonzero | median | |
---|---|---|
age_group_5 | ||
<25 | 14 | 32.70 |
25-29 | 15 | 21.71 |
30-34 | 14 | 34.62 |
35-39 | 5 | 36.05 |
40-44 | 5 | 27.41 |
50-54 | 7 | 44.67 |
55-59 | 7 | 39.04 |
60-64 | 7 | 35.69 |
65+ | 6 | 48.15 |
current_median_news_age_10_salaried = news_salaried.groupby(['age_group_10']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_salaried)
count_nonzero | median | |
---|---|---|
age_group_10 | ||
<25 | 16 | 70000.00 |
25-34 | 217 | 90000.00 |
35-44 | 189 | 120000.00 |
45-54 | 121 | 126572.33 |
55-64 | 96 | 143592.68 |
65+ | 18 | 140134.90 |
current_median_news_age_10_hourly = news_hourly.groupby(['age_group_10']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_hourly)
count_nonzero | median | |
---|---|---|
age_group_10 | ||
<25 | 14 | 32.70 |
25-34 | 29 | 33.33 |
35-44 | 10 | 34.09 |
45-54 | 10 | 47.98 |
55-64 | 14 | 37.36 |
65+ | 6 | 48.15 |
current_news_age_5_yos_salary = news_salaried.groupby(['age_group_5','years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_age_5_yos_salary)
count_nonzero | median | ||
---|---|---|---|
age_group_5 | years_of_service_grouped | ||
<25 | 0 | 7.00 | 68000.00 |
1-2 | 9.00 | 75780.00 | |
25-29 | 0 | 24.00 | 90000.00 |
1-2 | 40.00 | 79780.00 | |
3-5 | 41.00 | 82780.00 | |
6-10 | 6.00 | 102972.57 | |
30-34 | 0 | 11.00 | 90000.00 |
1-2 | 36.00 | 90000.00 | |
3-5 | 33.00 | 98340.00 | |
6-10 | 25.00 | 100000.00 | |
35-39 | 0 | 8.00 | 112890.00 |
1-2 | 24.00 | 103560.00 | |
3-5 | 31.00 | 108000.00 | |
6-10 | 33.00 | 125246.54 | |
11-15 | 20.00 | 119190.00 | |
40-44 | 0 | 5.00 | 170000.00 |
1-2 | 14.00 | 127780.00 | |
3-5 | 20.00 | 133640.00 | |
6-10 | 11.00 | 110000.00 | |
11-15 | 13.00 | 130000.00 | |
16-20 | 8.00 | 104942.01 | |
45-49 | 1-2 | 6.00 | 155810.00 |
3-5 | 14.00 | 147920.00 | |
6-10 | 14.00 | 105406.36 | |
16-20 | 5.00 | 129828.93 | |
21-25 | 9.00 | 91625.25 | |
50-54 | 1-2 | 5.00 | 129560.00 |
3-5 | 7.00 | 107855.60 | |
6-10 | 8.00 | 101929.11 | |
11-15 | 8.00 | 173790.76 | |
16-20 | 20.00 | 116206.39 | |
21-25 | 13.00 | 140206.92 | |
25+ | 8.00 | 153760.44 | |
55-59 | 6-10 | 11.00 | 117935.68 |
16-20 | 7.00 | 113734.51 | |
21-25 | 13.00 | 143276.51 | |
25+ | 7.00 | 130189.42 | |
60-64 | 3-5 | 6.00 | 178715.00 |
6-10 | 5.00 | 98698.90 | |
16-20 | 6.00 | 150234.76 | |
21-25 | 16.00 | 163606.82 | |
25+ | 14.00 | 152129.85 | |
65+ | 25+ | 9.00 | 118652.79 |
current_news_age_5_yos_hourly = news_hourly.groupby(['age_group_5','years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_age_5_yos_hourly)
count_nonzero | median | ||
---|---|---|---|
age_group_5 | years_of_service_grouped | ||
<25 | 0 | 7.00 | 35.90 |
1-2 | 7.00 | 21.00 | |
25-29 | 1-2 | 9.00 | 21.31 |
30-34 | 3-5 | 6.00 | 37.48 |
current_news_age_10_yos_salary = news_salaried.groupby(['age_group_10','years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_age_10_yos_salary)
count_nonzero | median | ||
---|---|---|---|
age_group_10 | years_of_service_grouped | ||
<25 | 0 | 7.00 | 68000.00 |
1-2 | 9.00 | 75780.00 | |
25-34 | 0 | 35.00 | 90000.00 |
1-2 | 76.00 | 85921.00 | |
3-5 | 74.00 | 90000.00 | |
6-10 | 31.00 | 101625.00 | |
35-44 | 0 | 13.00 | 120780.00 |
1-2 | 38.00 | 111485.00 | |
3-5 | 51.00 | 120840.00 | |
6-10 | 44.00 | 120560.00 | |
11-15 | 33.00 | 127340.00 | |
16-20 | 10.00 | 104942.01 | |
45-54 | 1-2 | 11.00 | 147910.00 |
3-5 | 21.00 | 133000.00 | |
6-10 | 22.00 | 103112.50 | |
11-15 | 10.00 | 143180.71 | |
16-20 | 25.00 | 118821.01 | |
21-25 | 22.00 | 113669.49 | |
25+ | 9.00 | 152730.88 | |
55-64 | 1-2 | 5.00 | 160780.00 |
3-5 | 8.00 | 164006.68 | |
6-10 | 16.00 | 114322.84 | |
16-20 | 13.00 | 122195.48 | |
21-25 | 29.00 | 146425.94 | |
25+ | 21.00 | 138517.37 | |
65+ | 25+ | 9.00 | 118652.79 |
current_news_age_10_yos_hourly = news_hourly.groupby(['age_group_10','years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_news_age_10_yos_hourly)
count_nonzero | median | ||
---|---|---|---|
age_group_10 | years_of_service_grouped | ||
<25 | 0 | 7.00 | 35.90 |
1-2 | 7.00 | 21.00 | |
25-34 | 1-2 | 12.00 | 21.11 |
3-5 | 10.00 | 35.52 | |
55-64 | 21-25 | 5.00 | 35.69 |
current_median_news_age_5_gender_salaried = news_salaried.groupby(['age_group_5','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_gender_salaried)
count_nonzero | median | ||
---|---|---|---|
age_group_5 | gender | ||
<25 | Female | 13.00 | 69280.00 |
25-29 | Female | 79.00 | 83780.00 |
Male | 32.00 | 80830.00 | |
30-34 | Female | 61.00 | 93000.00 |
Male | 44.00 | 99170.00 | |
35-39 | Female | 58.00 | 105790.00 |
Male | 60.00 | 127373.27 | |
40-44 | Female | 28.00 | 121560.00 |
Male | 43.00 | 131410.00 | |
45-49 | Female | 23.00 | 133000.00 |
Male | 28.00 | 124568.47 | |
50-54 | Female | 27.00 | 113280.00 |
Male | 43.00 | 134140.00 | |
55-59 | Female | 21.00 | 120363.00 |
Male | 26.00 | 140697.96 | |
60-64 | Female | 21.00 | 155522.35 |
Male | 28.00 | 152863.59 | |
65+ | Female | 5.00 | 162355.42 |
Male | 13.00 | 128180.00 |
current_median_news_age_5_gender_hourly = news_hourly.groupby(['age_group_5','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_gender_hourly)
count_nonzero | median | ||
---|---|---|---|
age_group_5 | gender | ||
<25 | Female | 11 | 32.65 |
25-29 | Female | 8 | 33.85 |
Male | 7 | 17.30 | |
30-34 | Female | 7 | 33.33 |
Male | 7 | 37.18 | |
50-54 | Female | 5 | 44.67 |
65+ | Female | 5 | 44.46 |
current_median_news_age_10_gender_salaried = news_salaried.groupby(['age_group_10','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_gender_salaried)
count_nonzero | median | ||
---|---|---|---|
age_group_10 | gender | ||
<25 | Female | 13.00 | 69280.00 |
25-34 | Female | 140.00 | 90000.00 |
Male | 76.00 | 90140.00 | |
35-44 | Female | 86.00 | 110016.26 |
Male | 103.00 | 130000.00 | |
45-54 | Female | 50.00 | 125350.00 |
Male | 71.00 | 127840.00 | |
55-64 | Female | 42.00 | 140669.27 |
Male | 54.00 | 144111.84 | |
65+ | Female | 5.00 | 162355.42 |
Male | 13.00 | 128180.00 |
current_median_news_age_10_gender_hourly = news_hourly.groupby(['age_group_10','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_gender_hourly)
count_nonzero | median | ||
---|---|---|---|
age_group_10 | gender | ||
<25 | Female | 11 | 32.65 |
25-34 | Female | 15 | 33.85 |
Male | 14 | 23.56 | |
35-44 | Female | 5 | 36.05 |
Male | 5 | 32.77 | |
45-54 | Female | 7 | 51.30 |
55-64 | Female | 7 | 42.94 |
Male | 7 | 34.77 | |
65+ | Female | 5 | 44.46 |
current_median_news_age_5_race_salaried = news_salaried.groupby(['age_group_5','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_race_salaried)
count_nonzero | median | ||
---|---|---|---|
age_group_5 | race_ethnicity | ||
<25 | White (United States of America) | 9.00 | 68000.00 |
25-29 | Asian (United States of America) | 17.00 | 90000.00 |
Black or African American (United States of America) | 10.00 | 82000.00 | |
Hispanic or Latino (United States of America) | 12.00 | 86690.00 | |
Two or More Races (United States of America) | 7.00 | 79000.00 | |
White (United States of America) | 60.00 | 80670.00 | |
30-34 | Asian (United States of America) | 9.00 | 120000.00 |
Black or African American (United States of America) | 11.00 | 92000.00 | |
Hispanic or Latino (United States of America) | 8.00 | 93894.94 | |
Prefer Not to Disclose (United States of America) | 5.00 | 95000.00 | |
White (United States of America) | 62.00 | 95104.86 | |
35-39 | Asian (United States of America) | 12.00 | 108170.00 |
Black or African American (United States of America) | 7.00 | 120000.00 | |
Hispanic or Latino (United States of America) | 8.00 | 100280.00 | |
White (United States of America) | 78.00 | 116420.00 | |
40-44 | Asian (United States of America) | 8.00 | 117246.75 |
Black or African American (United States of America) | 10.00 | 133640.00 | |
Hispanic or Latino (United States of America) | 7.00 | 96780.08 | |
White (United States of America) | 42.00 | 129390.00 | |
45-49 | White (United States of America) | 39.00 | 129828.93 |
50-54 | Asian (United States of America) | 5.00 | 118821.01 |
Black or African American (United States of America) | 9.00 | 110560.00 | |
White (United States of America) | 50.00 | 129625.00 | |
55-59 | Black or African American (United States of America) | 6.00 | 133849.19 |
White (United States of America) | 37.00 | 130000.00 | |
60-64 | White (United States of America) | 42.00 | 144428.01 |
65+ | White (United States of America) | 18.00 | 140134.90 |
current_median_news_age_5_race_hourly = news_hourly.groupby(['age_group_5','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_race_hourly)
count_nonzero | median | ||
---|---|---|---|
age_group_5 | race_ethnicity | ||
<25 | White (United States of America) | 6.00 | 32.70 |
25-29 | White (United States of America) | 10.00 | 20.88 |
30-34 | White (United States of America) | 10.00 | 34.36 |
35-39 | White (United States of America) | 5.00 | 36.05 |
50-54 | White (United States of America) | 6.00 | 39.42 |
55-59 | White (United States of America) | 5.00 | 39.04 |
60-64 | White (United States of America) | 7.00 | 35.69 |
current_median_news_age_5_race_group_salaried = news_salaried.groupby(['age_group_5','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_race_group_salaried)
count_nonzero | median | ||
---|---|---|---|
age_group_5 | race_grouping | ||
<25 | person of color | 7.00 | 80060.00 |
white | 9.00 | 68000.00 | |
25-29 | person of color | 46.00 | 85641.00 |
unknown | 5.00 | 90000.00 | |
white | 60.00 | 80670.00 | |
30-34 | person of color | 31.00 | 93000.00 |
unknown | 13.00 | 98340.00 | |
white | 62.00 | 95104.86 | |
35-39 | person of color | 29.00 | 108000.00 |
unknown | 11.00 | 133560.00 | |
white | 78.00 | 116420.00 | |
40-44 | person of color | 28.00 | 125812.50 |
white | 42.00 | 129390.00 | |
45-49 | person of color | 9.00 | 103970.05 |
white | 39.00 | 129828.93 | |
50-54 | person of color | 19.00 | 117924.49 |
white | 50.00 | 129625.00 | |
55-59 | person of color | 8.00 | 104560.78 |
white | 37.00 | 130000.00 | |
60-64 | white | 42.00 | 144428.01 |
65+ | white | 18.00 | 140134.90 |
current_median_news_age_5_race_group_hourly = news_hourly.groupby(['age_group_5','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_race_group_hourly)
count_nonzero | median | ||
---|---|---|---|
age_group_5 | race_grouping | ||
<25 | person of color | 8.00 | 33.70 |
white | 6.00 | 32.70 | |
25-29 | person of color | 5.00 | 25.41 |
white | 10.00 | 20.88 | |
30-34 | white | 10.00 | 34.36 |
35-39 | white | 5.00 | 36.05 |
50-54 | white | 6.00 | 39.42 |
55-59 | white | 5.00 | 39.04 |
60-64 | white | 7.00 | 35.69 |
current_median_news_age_10_race_salaried = news_salaried.groupby(['age_group_10','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_race_salaried)
count_nonzero | median | ||
---|---|---|---|
age_group_10 | race_ethnicity | ||
<25 | White (United States of America) | 9.00 | 68000.00 |
25-34 | Asian (United States of America) | 26.00 | 90530.00 |
Black or African American (United States of America) | 21.00 | 90000.00 | |
Hispanic or Latino (United States of America) | 20.00 | 91254.94 | |
Prefer Not to Disclose (United States of America) | 9.00 | 90000.00 | |
Two or More Races (United States of America) | 10.00 | 89500.00 | |
White (United States of America) | 122.00 | 88870.00 | |
35-44 | Asian (United States of America) | 20.00 | 110016.26 |
Black or African American (United States of America) | 17.00 | 131090.00 | |
Hispanic or Latino (United States of America) | 15.00 | 96780.08 | |
Two or More Races (United States of America) | 5.00 | 128280.00 | |
White (United States of America) | 120.00 | 119280.00 | |
45-54 | Asian (United States of America) | 7.00 | 118821.01 |
Black or African American (United States of America) | 12.00 | 112524.14 | |
Hispanic or Latino (United States of America) | 6.00 | 134232.40 | |
White (United States of America) | 89.00 | 129690.00 | |
55-64 | Black or African American (United States of America) | 8.00 | 138188.95 |
White (United States of America) | 79.00 | 138119.40 | |
65+ | White (United States of America) | 18.00 | 140134.90 |
current_median_news_age_10_race_hourly = news_hourly.groupby(['age_group_10','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_race_hourly)
count_nonzero | median | ||
---|---|---|---|
age_group_10 | race_ethnicity | ||
<25 | White (United States of America) | 6.00 | 32.70 |
25-34 | White (United States of America) | 20.00 | 32.48 |
35-44 | White (United States of America) | 7.00 | 35.40 |
45-54 | White (United States of America) | 9.00 | 44.67 |
55-64 | White (United States of America) | 12.00 | 37.36 |
current_median_news_age_10_race_group_salaried = news_salaried.groupby(['age_group_10','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_race_group_salaried)
count_nonzero | median | ||
---|---|---|---|
age_group_10 | race_grouping | ||
<25 | person of color | 7.00 | 80060.00 |
white | 9.00 | 68000.00 | |
25-34 | person of color | 77.00 | 90000.00 |
unknown | 18.00 | 92500.00 | |
white | 122.00 | 88870.00 | |
35-44 | person of color | 57.00 | 110560.00 |
unknown | 12.00 | 135060.00 | |
white | 120.00 | 119280.00 | |
45-54 | person of color | 28.00 | 116206.39 |
white | 89.00 | 129690.00 | |
55-64 | person of color | 12.00 | 138188.95 |
unknown | 5.00 | 165742.33 | |
white | 79.00 | 138119.40 | |
65+ | white | 18.00 | 140134.90 |
current_median_news_age_10_race_group_hourly = news_hourly.groupby(['age_group_10','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_race_group_hourly)
count_nonzero | median | ||
---|---|---|---|
age_group_10 | race_grouping | ||
<25 | person of color | 8.00 | 33.70 |
white | 6.00 | 32.70 | |
25-34 | person of color | 8.00 | 29.37 |
white | 20.00 | 32.48 | |
35-44 | white | 7.00 | 35.40 |
45-54 | white | 9.00 | 44.67 |
55-64 | white | 12.00 | 37.36 |
current_median_news_age_5_race_gender_salaried = news_salaried.groupby(['age_group_5','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_race_gender_salaried)
count_nonzero | median | |||
---|---|---|---|---|
age_group_5 | race_ethnicity | gender | ||
<25 | White (United States of America) | Female | 8.00 | 66640.00 |
25-29 | Asian (United States of America) | Female | 14.00 | 90000.00 |
Black or African American (United States of America) | Female | 8.00 | 79640.00 | |
Hispanic or Latino (United States of America) | Female | 9.00 | 92500.00 | |
White (United States of America) | Female | 41.00 | 80855.00 | |
Male | 19.00 | 80560.00 | ||
30-34 | Asian (United States of America) | Female | 6.00 | 128420.00 |
Black or African American (United States of America) | Female | 8.00 | 90780.00 | |
Hispanic or Latino (United States of America) | Female | 5.00 | 97780.00 | |
White (United States of America) | Female | 32.00 | 90383.33 | |
Male | 29.00 | 100000.00 | ||
35-39 | Asian (United States of America) | Female | 8.00 | 106340.00 |
Black or African American (United States of America) | Male | 5.00 | 132910.00 | |
Hispanic or Latino (United States of America) | Female | 6.00 | 92390.00 | |
White (United States of America) | Female | 35.00 | 109595.00 | |
Male | 43.00 | 123780.00 | ||
40-44 | Asian (United States of America) | Female | 6.00 | 110016.26 |
Black or African American (United States of America) | Male | 6.00 | 136265.00 | |
Hispanic or Latino (United States of America) | Male | 6.00 | 95780.04 | |
White (United States of America) | Female | 13.00 | 115162.62 | |
Male | 29.00 | 142780.00 | ||
45-49 | White (United States of America) | Female | 18.00 | 137615.02 |
Male | 21.00 | 125340.00 | ||
50-54 | Black or African American (United States of America) | Male | 6.00 | 112524.14 |
White (United States of America) | Female | 22.00 | 116765.39 | |
Male | 28.00 | 136806.42 | ||
55-59 | White (United States of America) | Female | 17.00 | 117935.68 |
Male | 20.00 | 146689.04 | ||
60-64 | White (United States of America) | Female | 16.00 | 140669.27 |
Male | 26.00 | 152863.59 | ||
65+ | White (United States of America) | Female | 5.00 | 162355.42 |
Male | 13.00 | 128180.00 |
current_median_news_age_5_race_gender_hourly = news_hourly.groupby(['age_group_5','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_race_gender_hourly)
count_nonzero | median | |||
---|---|---|---|---|
age_group_5 | race_ethnicity | gender | ||
25-29 | White (United States of America) | Female | 6.00 | 34.88 |
30-34 | White (United States of America) | Male | 6.00 | 37.48 |
50-54 | White (United States of America) | Female | 5.00 | 44.67 |
current_median_news_age_5_race_group_gender_salaried = news_salaried.groupby(['age_group_5','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_race_group_gender_salaried)
count_nonzero | median | |||
---|---|---|---|---|
age_group_5 | race_grouping | gender | ||
<25 | person of color | Female | 5.00 | 72500.00 |
white | Female | 8.00 | 66640.00 | |
25-29 | person of color | Female | 35.00 | 90000.00 |
Male | 11.00 | 80880.00 | ||
white | Female | 41.00 | 80855.00 | |
Male | 19.00 | 80560.00 | ||
30-34 | person of color | Female | 21.00 | 94280.00 |
Male | 10.00 | 90500.00 | ||
unknown | Female | 8.00 | 92500.00 | |
Male | 5.00 | 130000.00 | ||
white | Female | 32.00 | 90383.33 | |
Male | 29.00 | 100000.00 | ||
35-39 | person of color | Female | 18.00 | 97420.00 |
Male | 11.00 | 132910.00 | ||
unknown | Female | 5.00 | 136560.00 | |
Male | 6.00 | 131780.00 | ||
white | Female | 35.00 | 109595.00 | |
Male | 43.00 | 123780.00 | ||
40-44 | person of color | Female | 14.00 | 129140.00 |
Male | 14.00 | 122592.50 | ||
white | Female | 13.00 | 115162.62 | |
Male | 29.00 | 142780.00 | ||
45-49 | person of color | Male | 5.00 | 90443.52 |
white | Female | 18.00 | 137615.02 | |
Male | 21.00 | 125340.00 | ||
50-54 | person of color | Female | 5.00 | 104504.47 |
Male | 14.00 | 126992.52 | ||
white | Female | 22.00 | 116765.39 | |
Male | 28.00 | 136806.42 | ||
55-59 | white | Female | 17.00 | 117935.68 |
Male | 20.00 | 146689.04 | ||
60-64 | white | Female | 16.00 | 140669.27 |
Male | 26.00 | 152863.59 | ||
65+ | white | Female | 5.00 | 162355.42 |
Male | 13.00 | 128180.00 |
current_median_news_age_5_race_group_gender_hourly = news_hourly.groupby(['age_group_5','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_5_race_group_gender_hourly)
count_nonzero | median | |||
---|---|---|---|---|
age_group_5 | race_grouping | gender | ||
<25 | person of color | Female | 7.00 | 32.40 |
25-29 | white | Female | 6.00 | 34.88 |
30-34 | white | Male | 6.00 | 37.48 |
50-54 | white | Female | 5.00 | 44.67 |
current_median_news_age_10_race_gender_salaried = news_salaried.groupby(['age_group_10','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_race_gender_salaried)
count_nonzero | median | |||
---|---|---|---|---|
age_group_10 | race_ethnicity | gender | ||
<25 | White (United States of America) | Female | 8.00 | 66640.00 |
25-34 | Asian (United States of America) | Female | 20.00 | 92670.00 |
Male | 6.00 | 86500.00 | ||
Black or African American (United States of America) | Female | 16.00 | 87281.00 | |
Male | 5.00 | 92000.00 | ||
Hispanic or Latino (United States of America) | Female | 14.00 | 92780.00 | |
Male | 6.00 | 80330.00 | ||
Prefer Not to Disclose (United States of America) | Male | 5.00 | 98340.00 | |
Two or More Races (United States of America) | Female | 6.00 | 90390.00 | |
White (United States of America) | Female | 73.00 | 85740.00 | |
Male | 48.00 | 92140.00 | ||
35-44 | Asian (United States of America) | Female | 14.00 | 109112.01 |
Male | 6.00 | 125812.50 | ||
Black or African American (United States of America) | Female | 6.00 | 107853.74 | |
Male | 11.00 | 136190.00 | ||
Hispanic or Latino (United States of America) | Female | 7.00 | 94780.00 | |
Male | 8.00 | 103390.04 | ||
Two or More Races (United States of America) | Female | 5.00 | 128280.00 | |
White (United States of America) | Female | 48.00 | 112297.50 | |
Male | 72.00 | 130705.00 | ||
45-54 | Black or African American (United States of America) | Male | 8.00 | 112524.14 |
Hispanic or Latino (United States of America) | Male | 6.00 | 134232.40 | |
White (United States of America) | Female | 40.00 | 129625.00 | |
Male | 49.00 | 129828.93 | ||
55-64 | Black or African American (United States of America) | Female | 5.00 | 174945.88 |
White (United States of America) | Female | 33.00 | 127573.14 | |
Male | 46.00 | 147524.37 | ||
65+ | White (United States of America) | Female | 5.00 | 162355.42 |
Male | 13.00 | 128180.00 |
current_median_news_age_10_race_gender_hourly = news_hourly.groupby(['age_group_10','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_race_gender_hourly)
count_nonzero | median | |||
---|---|---|---|---|
age_group_10 | race_ethnicity | gender | ||
25-34 | White (United States of America) | Female | 10.00 | 32.48 |
Male | 10.00 | 28.05 | ||
45-54 | White (United States of America) | Female | 7.00 | 51.30 |
55-64 | White (United States of America) | Female | 5.00 | 42.94 |
Male | 7.00 | 34.77 |
current_median_news_age_10_race_group_gender_salaried = news_salaried.groupby(['age_group_10','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_race_group_gender_salaried)
count_nonzero | median | |||
---|---|---|---|---|
age_group_10 | race_grouping | gender | ||
<25 | person of color | Female | 5.00 | 72500.00 |
white | Female | 8.00 | 66640.00 | |
25-34 | person of color | Female | 56.00 | 90394.94 |
Male | 21.00 | 88000.00 | ||
unknown | Female | 11.00 | 90000.00 | |
Male | 7.00 | 102780.00 | ||
white | Female | 73.00 | 85740.00 | |
Male | 48.00 | 92140.00 | ||
35-44 | person of color | Female | 32.00 | 109112.01 |
Male | 25.00 | 124345.00 | ||
unknown | Female | 6.00 | 141810.00 | |
Male | 6.00 | 131780.00 | ||
white | Female | 48.00 | 112297.50 | |
Male | 72.00 | 130705.00 | ||
45-54 | person of color | Female | 9.00 | 117924.49 |
Male | 19.00 | 114488.29 | ||
white | Female | 40.00 | 129625.00 | |
Male | 49.00 | 129828.93 | ||
55-64 | person of color | Female | 7.00 | 180000.00 |
Male | 5.00 | 98411.56 | ||
white | Female | 33.00 | 127573.14 | |
Male | 46.00 | 147524.37 | ||
65+ | white | Female | 5.00 | 162355.42 |
Male | 13.00 | 128180.00 |
current_median_news_age_10_race_group_gender_hourly = news_hourly.groupby(['age_group_10','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_news_age_10_race_group_gender_hourly)
count_nonzero | median | |||
---|---|---|---|---|
age_group_10 | race_grouping | gender | ||
<25 | person of color | Female | 7.00 | 32.40 |
25-34 | white | Female | 10.00 | 32.48 |
Male | 10.00 | 28.05 | ||
45-54 | white | Female | 7.00 | 51.30 |
55-64 | white | Female | 5.00 | 42.94 |
Male | 7.00 | 34.77 |
current_news_median_desk_salaried = news_salaried.groupby(['desk']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_salaried)
count_nonzero | median | |
---|---|---|
desk | ||
National | 114 | 158242.11 |
Foreign | 33 | 142780.00 |
non-newsroom | 18 | 140840.00 |
Financial | 43 | 137140.00 |
Editorial | 33 | 128560.27 |
Local | 68 | 113140.00 |
Style | 50 | 111833.47 |
Universal Desk | 6 | 107876.01 |
Sports | 43 | 107560.00 |
Outlook | 6 | 105497.50 |
Graphics | 21 | 104320.14 |
Photography | 27 | 98340.00 |
Video | 46 | 94420.00 |
Audio | 8 | 93140.00 |
Multiplatform | 44 | 90810.00 |
Audience Development and Engagement | 28 | 89080.00 |
Design | 33 | 88000.00 |
Emerging News Products | 26 | 75000.00 |
current_news_median_desk_hourly = news_hourly.groupby(['desk']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_hourly)
count_nonzero | median | |
---|---|---|
desk | ||
Universal Desk | 8 | 44.16 |
Editorial | 8 | 43.27 |
National | 11 | 33.85 |
Multiplatform | 7 | 32.77 |
Sports | 16 | 27.74 |
Operations | 8 | 22.68 |
Style | 6 | 20.95 |
current_news_median_desk_gender_salaried = news_salaried.groupby(['desk','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_gender_salaried)
count_nonzero | median | ||
---|---|---|---|
desk | gender | ||
National | Male | 60 | 178737.50 |
Editorial | Male | 18 | 156476.68 |
non-newsroom | Male | 11 | 151780.00 |
Foreign | Male | 16 | 150780.00 |
National | Female | 54 | 145602.97 |
Financial | Male | 27 | 137340.00 |
Female | 16 | 133666.00 | |
Foreign | Female | 17 | 129780.00 |
non-newsroom | Female | 7 | 129060.00 |
Sports | Female | 10 | 125935.00 |
Local | Male | 32 | 122011.17 |
Style | Male | 22 | 118280.00 |
Local | Female | 36 | 108075.80 |
Photography | Male | 17 | 107940.95 |
Style | Female | 28 | 107060.00 |
Graphics | Female | 11 | 104320.14 |
Sports | Male | 33 | 103685.00 |
Editorial | Female | 15 | 102692.61 |
Graphics | Male | 9 | 101640.00 |
Multiplatform | Male | 16 | 96356.89 |
Video | Male | 17 | 95000.00 |
Audience Development and Engagement | Male | 9 | 94780.00 |
Audio | Female | 5 | 94280.00 |
Design | Male | 14 | 93640.00 |
Photography | Female | 10 | 91847.57 |
Video | Female | 29 | 91560.00 |
Audience Development and Engagement | Female | 19 | 88880.00 |
Multiplatform | Female | 28 | 88560.00 |
Design | Female | 19 | 82201.52 |
Emerging News Products | Male | 7 | 76840.01 |
Female | 19 | 73440.00 |
current_news_median_desk_gender_hourly = news_hourly.groupby(['desk','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_gender_hourly)
count_nonzero | median | ||
---|---|---|---|
desk | gender | ||
Universal Desk | Female | 5 | 43.87 |
Editorial | Female | 6 | 43.27 |
National | Female | 8 | 39.42 |
Sports | Male | 13 | 33.77 |
Style | Female | 5 | 20.91 |
current_news_median_desk_race_salaried = news_salaried.groupby(['desk','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_race_salaried)
count_nonzero | median | ||
---|---|---|---|
desk | race_grouping | ||
National | white | 83 | 177838.45 |
non-newsroom | white | 13 | 151780.00 |
Foreign | unknown | 25 | 144340.00 |
Financial | white | 33 | 137799.88 |
National | person of color | 28 | 135595.00 |
Editorial | white | 25 | 134806.25 |
Financial | person of color | 8 | 131420.00 |
Foreign | white | 6 | 124843.27 |
Style | white | 38 | 115471.31 |
Local | white | 49 | 113280.00 |
person of color | 19 | 113000.00 | |
Sports | white | 29 | 107560.00 |
Graphics | white | 11 | 106340.00 |
Photography | white | 19 | 106340.00 |
Editorial | person of color | 7 | 104560.00 |
Style | person of color | 12 | 104243.74 |
Graphics | person of color | 8 | 102980.07 |
Sports | person of color | 14 | 102780.00 |
Photography | person of color | 7 | 95000.00 |
Video | white | 27 | 95000.00 |
Multiplatform | white | 36 | 92547.18 |
person of color | 8 | 90611.76 | |
Design | person of color | 15 | 90009.88 |
Audience Development and Engagement | white | 17 | 89280.00 |
person of color | 11 | 88280.00 | |
Video | person of color | 18 | 87000.00 |
Design | white | 17 | 85060.00 |
Emerging News Products | white | 18 | 76700.01 |
person of color | 7 | 71853.00 |
current_news_median_desk_race_hourly = news_hourly.groupby(['desk','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_race_hourly)
count_nonzero | median | ||
---|---|---|---|
desk | race_ethnicity | ||
Universal Desk | White (United States of America) | 6 | 44.16 |
Editorial | White (United States of America) | 6 | 39.49 |
National | White (United States of America) | 7 | 33.85 |
Sports | White (United States of America) | 15 | 33.77 |
Operations | White (United States of America) | 5 | 28.54 |
current_news_median_desk_race_gender_salaried = news_salaried.groupby(['desk','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_race_gender_salaried)
count_nonzero | median | |||
---|---|---|---|---|
desk | race_ethnicity | gender | ||
National | White (United States of America) | Male | 47 | 186065.00 |
Sports | White (United States of America) | Female | 5 | 160089.98 |
Editorial | White (United States of America) | Male | 15 | 158673.36 |
non-newsroom | White (United States of America) | Male | 9 | 151840.00 |
National | White (United States of America) | Female | 36 | 148550.00 |
Asian (United States of America) | Female | 8 | 144985.00 | |
Financial | White (United States of America) | Male | 23 | 142027.50 |
Sports | Black or African American (United States of America) | Male | 5 | 136340.00 |
National | Black or African American (United States of America) | Male | 7 | 136190.00 |
Financial | White (United States of America) | Female | 10 | 129876.00 |
Local | White (United States of America) | Male | 25 | 121113.20 |
Photography | White (United States of America) | Male | 9 | 119340.00 |
Graphics | White (United States of America) | Male | 5 | 117131.00 |
Style | White (United States of America) | Male | 18 | 116175.06 |
Female | 20 | 111833.47 | ||
Local | White (United States of America) | Female | 24 | 108075.80 |
Editorial | White (United States of America) | Female | 10 | 107200.36 |
Graphics | White (United States of America) | Female | 5 | 104780.00 |
Audience Development and Engagement | White (United States of America) | Male | 7 | 103060.00 |
Sports | White (United States of America) | Male | 24 | 102653.66 |
Multiplatform | White (United States of America) | Male | 13 | 96600.08 |
Video | White (United States of America) | Female | 13 | 95000.00 |
Male | 14 | 94500.00 | ||
Photography | Black or African American (United States of America) | Male | 6 | 93500.00 |
Design | White (United States of America) | Male | 6 | 92500.00 |
Photography | White (United States of America) | Female | 10 | 91847.57 |
Video | Black or African American (United States of America) | Female | 5 | 90000.00 |
Multiplatform | White (United States of America) | Female | 23 | 89840.00 |
Audience Development and Engagement | White (United States of America) | Female | 10 | 86330.00 |
Design | White (United States of America) | Female | 11 | 82201.52 |
Video | Hispanic or Latino (United States of America) | Female | 5 | 78000.00 |
Emerging News Products | White (United States of America) | Female | 14 | 72940.00 |
current_news_median_desk_race_gender_hourly = news_hourly.groupby(['desk','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_race_gender_hourly)
count_nonzero | median | |||
---|---|---|---|---|
desk | race_ethnicity | gender | ||
Editorial | White (United States of America) | Female | 5 | 42.94 |
National | White (United States of America) | Female | 5 | 33.85 |
Sports | White (United States of America) | Male | 12 | 33.77 |
current_news_median_desk_race_group_gender_salaried = news_salaried.groupby(['desk','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_race_group_gender_salaried)
count_nonzero | median | |||
---|---|---|---|---|
desk | race_grouping | gender | ||
National | white | Male | 47 | 186065.00 |
Sports | white | Female | 5 | 160089.98 |
Editorial | white | Male | 15 | 158673.36 |
non-newsroom | white | Male | 9 | 151840.00 |
National | white | Female | 36 | 148550.00 |
Foreign | unknown | Male | 12 | 147670.00 |
Financial | white | Male | 23 | 142027.50 |
person of color | Female | 6 | 141070.00 | |
National | person of color | Female | 16 | 137890.00 |
Foreign | unknown | Female | 13 | 136560.00 |
National | person of color | Male | 12 | 135050.00 |
Financial | white | Female | 10 | 129876.00 |
Local | person of color | Male | 7 | 122909.15 |
white | Male | 25 | 121113.20 | |
Photography | white | Male | 9 | 119340.00 |
Graphics | white | Male | 5 | 117131.00 |
Style | white | Male | 18 | 116175.06 |
Female | 20 | 111833.47 | ||
Sports | person of color | Male | 9 | 110560.00 |
Local | white | Female | 24 | 108075.80 |
Editorial | white | Female | 10 | 107200.36 |
Local | person of color | Female | 12 | 106890.00 |
Graphics | white | Female | 5 | 104780.00 |
person of color | Female | 5 | 104320.14 | |
Audience Development and Engagement | white | Male | 7 | 103060.00 |
Sports | white | Male | 24 | 102653.66 |
Multiplatform | white | Male | 13 | 96600.08 |
Style | person of color | Female | 8 | 96353.74 |
Sports | person of color | Female | 5 | 95000.00 |
Video | white | Female | 13 | 95000.00 |
Photography | person of color | Male | 7 | 95000.00 |
Video | white | Male | 14 | 94500.00 |
Design | person of color | Male | 7 | 92500.00 |
white | Male | 6 | 92500.00 | |
Photography | white | Female | 10 | 91847.57 |
Audience Development and Engagement | person of color | Female | 9 | 90000.00 |
Multiplatform | white | Female | 23 | 89840.00 |
Audience Development and Engagement | white | Female | 10 | 86330.00 |
Design | person of color | Female | 8 | 85789.94 |
Multiplatform | person of color | Female | 5 | 84220.51 |
Video | person of color | Female | 15 | 84000.00 |
Design | white | Female | 11 | 82201.52 |
Emerging News Products | white | Female | 14 | 72940.00 |
current_news_median_desk_race_group_gender_hourly = news_hourly.groupby(['desk','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_race_group_gender_hourly)
count_nonzero | median | |||
---|---|---|---|---|
desk | race_grouping | gender | ||
Editorial | white | Female | 5 | 42.94 |
National | white | Female | 5 | 33.85 |
Sports | white | Male | 12 | 33.77 |
current_news_median_desk_race_gender_age5_salaried = news_salaried.groupby(['desk','race_ethnicity','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_race_gender_age5_salaried)
count_nonzero | median | ||||
---|---|---|---|---|---|
desk | race_ethnicity | gender | age_group_5 | ||
National | White (United States of America) | Female | 50-54 | 5.00 | 194690.00 |
Male | 50-54 | 5.00 | 191325.00 | ||
60-64 | 6.00 | 185984.24 | |||
35-39 | 12.00 | 184065.00 | |||
40-44 | 10.00 | 177347.15 | |||
Female | 55-59 | 6.00 | 153515.00 | ||
45-49 | 5.00 | 147910.00 | |||
Financial | White (United States of America) | Male | 40-44 | 5.00 | 146315.00 |
Sports | White (United States of America) | Male | 35-39 | 7.00 | 143680.00 |
50-54 | 7.00 | 140206.92 | |||
Local | White (United States of America) | Male | 55-59 | 5.00 | 138119.40 |
National | White (United States of America) | Female | 35-39 | 6.00 | 122735.00 |
Local | White (United States of America) | Female | 50-54 | 5.00 | 113280.00 |
Style | White (United States of America) | Female | 35-39 | 6.00 | 102552.82 |
Video | White (United States of America) | Male | 30-34 | 6.00 | 89750.00 |
Design | White (United States of America) | Female | 25-29 | 5.00 | 78000.00 |
Emerging News Products | White (United States of America) | Female | 25-29 | 5.00 | 72440.00 |
current_news_median_desk_race_gender_age5_hourly = news_hourly.groupby(['desk','race_ethnicity','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_race_gender_age5_hourly)
count_nonzero | median | ||||
---|---|---|---|---|---|
desk | race_ethnicity | gender | age_group_5 |
current_news_median_desk_race_group_gender_age5_salaried = news_salaried.groupby(['desk','race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_race_group_gender_age5_salaried)
count_nonzero | median | ||||
---|---|---|---|---|---|
desk | race_grouping | gender | age_group_5 | ||
National | white | Female | 50-54 | 5.00 | 194690.00 |
Male | 50-54 | 5.00 | 191325.00 | ||
60-64 | 6.00 | 185984.24 | |||
35-39 | 12.00 | 184065.00 | |||
40-44 | 10.00 | 177347.15 | |||
Female | 55-59 | 6.00 | 153515.00 | ||
45-49 | 5.00 | 147910.00 | |||
Financial | white | Male | 40-44 | 5.00 | 146315.00 |
Sports | white | Male | 35-39 | 7.00 | 143680.00 |
50-54 | 7.00 | 140206.92 | |||
Local | white | Male | 55-59 | 5.00 | 138119.40 |
Foreign | unknown | Male | 35-39 | 5.00 | 133560.00 |
National | person of color | Female | 40-44 | 5.00 | 130000.00 |
Foreign | unknown | Female | 30-34 | 5.00 | 128780.00 |
National | white | Female | 35-39 | 6.00 | 122735.00 |
Local | white | Female | 50-54 | 5.00 | 113280.00 |
Style | white | Female | 35-39 | 6.00 | 102552.82 |
Video | white | Male | 30-34 | 6.00 | 89750.00 |
person of color | Female | 25-29 | 9.00 | 84000.00 | |
Design | white | Female | 25-29 | 5.00 | 78000.00 |
Emerging News Products | white | Female | 25-29 | 5.00 | 72440.00 |
current_news_median_desk_race_group_gender_age5_hourly = news_hourly.groupby(['desk','race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_race_group_gender_age5_hourly)
count_nonzero | median | ||||
---|---|---|---|---|---|
desk | race_grouping | gender | age_group_5 |
current_news_median_desk_tier_salaried = news_salaried.groupby(['tier']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_tier_salaried)
count_nonzero | median | |
---|---|---|
tier | ||
Tier 1 | 190 | 144973.59 |
other | 18 | 140840.00 |
Tier 2 | 227 | 110453.45 |
Tier 3 | 189 | 91625.25 |
Tier 4 | 33 | 78000.00 |
current_news_median_desk_tier_gender_salaried = news_salaried.groupby(['tier','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_tier_gender_salaried)
count_nonzero | median | ||
---|---|---|---|
tier | gender | ||
Tier 1 | Male | 103 | 155780.00 |
other | Male | 11 | 151780.00 |
Tier 1 | Female | 87 | 140780.00 |
other | Female | 7 | 129060.00 |
Tier 2 | Male | 120 | 116620.55 |
Female | 106 | 105539.28 | |
Tier 3 | Male | 77 | 96780.08 |
Female | 112 | 89080.00 | |
Tier 4 | Male | 9 | 78340.00 |
Female | 24 | 77280.00 |
current_news_median_desk_tier_race_salaried = news_salaried.groupby(['tier','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_tier_race_salaried)
count_nonzero | median | ||
---|---|---|---|
tier | race_ethnicity | ||
Tier 1 | White (United States of America) | 122 | 161670.00 |
other | White (United States of America) | 13 | 151780.00 |
Tier 1 | Black or African American (United States of America) | 12 | 141307.97 |
Asian (United States of America) | 18 | 138960.00 | |
Hispanic or Latino (United States of America) | 6 | 131455.00 | |
Tier 2 | White (United States of America) | 159 | 114669.36 |
Black or African American (United States of America) | 23 | 113000.00 | |
Hispanic or Latino (United States of America) | 14 | 103710.00 | |
Tier 1 | Prefer Not to Disclose (United States of America) | 5 | 102780.00 |
Tier 2 | Two or More Races (United States of America) | 8 | 96107.50 |
Asian (United States of America) | 19 | 95000.00 | |
Tier 3 | White (United States of America) | 121 | 94211.74 |
Black or African American (United States of America) | 23 | 91280.34 | |
Hispanic or Latino (United States of America) | 18 | 90226.70 | |
Asian (United States of America) | 16 | 88140.00 | |
Tier 4 | White (United States of America) | 22 | 78464.76 |
Tier 3 | Two or More Races (United States of America) | 6 | 75750.00 |
current_news_median_desk_tier_race_gender_salaried = news_salaried.groupby(['tier','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_tier_race_gender_salaried)
count_nonzero | median | |||
---|---|---|---|---|
tier | race_ethnicity | gender | ||
Tier 1 | White (United States of America) | Male | 72 | 173325.41 |
other | White (United States of America) | Male | 9 | 151840.00 |
Tier 1 | White (United States of America) | Female | 50 | 143210.02 |
Asian (United States of America) | Female | 12 | 142890.00 | |
Black or African American (United States of America) | Male | 8 | 134550.00 | |
Asian (United States of America) | Male | 6 | 129735.00 | |
Tier 2 | Hispanic or Latino (United States of America) | Male | 8 | 127582.40 |
Black or African American (United States of America) | Male | 12 | 121454.57 | |
White (United States of America) | Male | 91 | 116681.11 | |
Female | 67 | 108840.00 | ||
Black or African American (United States of America) | Female | 11 | 104340.00 | |
Asian (United States of America) | Male | 6 | 101555.00 | |
Tier 3 | White (United States of America) | Male | 52 | 99349.45 |
Tier 2 | Hispanic or Latino (United States of America) | Female | 6 | 99280.00 |
Asian (United States of America) | Female | 13 | 95000.00 | |
Tier 3 | Hispanic or Latino (United States of America) | Male | 9 | 94780.00 |
Black or African American (United States of America) | Male | 9 | 92752.50 | |
Tier 2 | Two or More Races (United States of America) | Female | 6 | 92590.00 |
Tier 3 | Asian (United States of America) | Female | 13 | 90780.00 |
Hispanic or Latino (United States of America) | Female | 9 | 90000.00 | |
White (United States of America) | Female | 69 | 89280.00 | |
Black or African American (United States of America) | Female | 14 | 89140.00 | |
Tier 4 | White (United States of America) | Male | 5 | 79060.00 |
Female | 17 | 76560.00 | ||
Tier 3 | Two or More Races (United States of America) | Female | 5 | 72500.00 |
current_news_median_desk_tier_race_group_gender_salaried = news_salaried.groupby(['tier','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_tier_race_group_gender_salaried)
count_nonzero | median | |||
---|---|---|---|---|
tier | race_grouping | gender | ||
Tier 1 | white | Male | 72 | 173325.41 |
other | white | Male | 9 | 151840.00 |
Tier 1 | unknown | Male | 15 | 144340.00 |
white | Female | 50 | 143210.02 | |
person of color | Female | 22 | 138960.00 | |
unknown | Female | 15 | 136560.00 | |
person of color | Male | 16 | 135050.00 | |
Tier 2 | white | Male | 91 | 116681.11 |
person of color | Male | 28 | 115355.00 | |
white | Female | 67 | 108840.00 | |
person of color | Female | 37 | 102692.61 | |
Tier 3 | white | Male | 52 | 99349.45 |
person of color | Male | 23 | 92500.00 | |
white | Female | 69 | 89280.00 | |
person of color | Female | 41 | 88280.00 | |
Tier 4 | white | Male | 5 | 79060.00 |
Female | 17 | 76560.00 | ||
person of color | Female | 6 | 74000.00 |
current_news_median_desk_tier_race_gender_age5_salaried = news_salaried.groupby(['tier','race_ethnicity','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_tier_race_gender_age5_salaried)
count_nonzero | median | ||||
---|---|---|---|---|---|
tier | race_ethnicity | gender | age_group_5 | ||
Tier 1 | White (United States of America) | Male | 65+ | 5.00 | 192244.02 |
50-54 | 7.00 | 191325.00 | |||
60-64 | 8.00 | 182534.24 | |||
35-39 | 16.00 | 179875.00 | |||
45-49 | 6.00 | 176422.50 | |||
Female | 50-54 | 7.00 | 175340.00 | ||
Male | 40-44 | 15.00 | 174409.31 | ||
Female | 45-49 | 7.00 | 167910.00 | ||
Tier 2 | White (United States of America) | Female | 60-64 | 7.00 | 155522.35 |
Tier 1 | White (United States of America) | Female | 55-59 | 6.00 | 153515.00 |
Tier 2 | White (United States of America) | Male | 55-59 | 15.00 | 150101.57 |
50-54 | 14.00 | 133726.66 | |||
Tier 1 | White (United States of America) | Female | 30-34 | 5.00 | 131910.00 |
Tier 2 | White (United States of America) | Male | 40-44 | 7.00 | 130000.00 |
60-64 | 9.00 | 127410.00 | |||
35-39 | 14.00 | 127375.96 | |||
Tier 1 | White (United States of America) | Female | 35-39 | 9.00 | 124440.00 |
Male | 30-34 | 8.00 | 121895.00 | ||
Tier 2 | White (United States of America) | Female | 45-49 | 6.00 | 116920.00 |
Male | 65+ | 6.00 | 116906.05 | ||
Tier 1 | White (United States of America) | Female | 25-29 | 7.00 | 116780.00 |
Tier 2 | White (United States of America) | Male | 45-49 | 11.00 | 114669.36 |
Tier 3 | White (United States of America) | Female | 45-49 | 5.00 | 113809.68 |
Tier 2 | White (United States of America) | Female | 50-54 | 7.00 | 113280.00 |
Tier 3 | White (United States of America) | Female | 55-59 | 5.00 | 111903.66 |
Male | 60-64 | 6.00 | 111679.00 | ||
Tier 2 | White (United States of America) | Female | 55-59 | 6.00 | 109592.13 |
35-39 | 12.00 | 109217.50 | |||
40-44 | 5.00 | 107162.62 | |||
Tier 3 | White (United States of America) | Male | 40-44 | 5.00 | 106340.00 |
50-54 | 6.00 | 100659.11 | |||
Tier 2 | White (United States of America) | Female | 30-34 | 8.00 | 98810.08 |
Tier 3 | White (United States of America) | Female | 35-39 | 14.00 | 98416.89 |
Tier 2 | White (United States of America) | Male | 30-34 | 7.00 | 95209.71 |
Tier 3 | White (United States of America) | Male | 35-39 | 12.00 | 95000.00 |
Female | 60-64 | 5.00 | 94884.19 | ||
Male | 30-34 | 11.00 | 94000.00 | ||
Female | 50-54 | 7.00 | 93840.00 | ||
Tier 2 | White (United States of America) | Female | 25-29 | 10.00 | 91670.00 |
Asian (United States of America) | Female | 25-29 | 5.00 | 90000.00 | |
Tier 3 | Black or African American (United States of America) | Female | 30-34 | 5.00 | 90000.00 |
White (United States of America) | Female | 30-34 | 13.00 | 89280.00 | |
Asian (United States of America) | Female | 25-29 | 7.00 | 85000.00 | |
Tier 4 | White (United States of America) | Female | 30-34 | 5.00 | 84160.00 |
Tier 2 | White (United States of America) | Male | 25-29 | 7.00 | 80780.00 |
Tier 3 | White (United States of America) | Female | 25-29 | 17.00 | 79060.00 |
Male | 25-29 | 5.00 | 78060.00 | ||
Black or African American (United States of America) | Female | 25-29 | 5.00 | 73780.00 | |
Tier 4 | White (United States of America) | Female | 25-29 | 7.00 | 73440.00 |
current_news_median_desk_tier_race_group_gender_age5_salaried = news_salaried.groupby(['tier','race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_desk_tier_race_group_gender_age5_salaried)
count_nonzero | median | ||||
---|---|---|---|---|---|
tier | race_grouping | gender | age_group_5 | ||
Tier 1 | white | Male | 65+ | 5.00 | 192244.02 |
50-54 | 7.00 | 191325.00 | |||
60-64 | 8.00 | 182534.24 | |||
35-39 | 16.00 | 179875.00 | |||
45-49 | 6.00 | 176422.50 | |||
Female | 50-54 | 7.00 | 175340.00 | ||
Male | 40-44 | 15.00 | 174409.31 | ||
Female | 45-49 | 7.00 | 167910.00 | ||
person of color | Male | 35-39 | 5.00 | 158340.00 | |
Tier 2 | white | Female | 60-64 | 7.00 | 155522.35 |
Tier 1 | white | Female | 55-59 | 6.00 | 153515.00 |
Tier 2 | white | Male | 55-59 | 15.00 | 150101.57 |
Tier 1 | person of color | Female | 40-44 | 7.00 | 137140.00 |
unknown | Female | 35-39 | 5.00 | 136560.00 | |
Tier 2 | person of color | Male | 40-44 | 7.00 | 136340.00 |
white | Male | 50-54 | 14.00 | 133726.66 | |
Tier 1 | unknown | Male | 35-39 | 5.00 | 133560.00 |
white | Female | 30-34 | 5.00 | 131910.00 | |
Tier 2 | white | Male | 40-44 | 7.00 | 130000.00 |
Tier 1 | unknown | Female | 30-34 | 5.00 | 128780.00 |
Tier 2 | person of color | Female | 40-44 | 5.00 | 128280.00 |
white | Male | 60-64 | 9.00 | 127410.00 | |
35-39 | 14.00 | 127375.96 | |||
Tier 1 | white | Female | 35-39 | 9.00 | 124440.00 |
Tier 2 | person of color | Male | 50-54 | 9.00 | 122909.15 |
Tier 1 | white | Male | 30-34 | 8.00 | 121895.00 |
Tier 2 | white | Female | 45-49 | 6.00 | 116920.00 |
Male | 65+ | 6.00 | 116906.05 | ||
Tier 1 | white | Female | 25-29 | 7.00 | 116780.00 |
Tier 2 | white | Male | 45-49 | 11.00 | 114669.36 |
Tier 3 | white | Female | 45-49 | 5.00 | 113809.68 |
Tier 2 | white | Female | 50-54 | 7.00 | 113280.00 |
Tier 3 | white | Female | 55-59 | 5.00 | 111903.66 |
Male | 60-64 | 6.00 | 111679.00 | ||
Tier 2 | white | Female | 55-59 | 6.00 | 109592.13 |
35-39 | 12.00 | 109217.50 | |||
40-44 | 5.00 | 107162.62 | |||
Tier 3 | white | Male | 40-44 | 5.00 | 106340.00 |
Tier 2 | person of color | Female | 35-39 | 7.00 | 104340.00 |
30-34 | 5.00 | 100780.00 | |||
Tier 3 | white | Male | 50-54 | 6.00 | 100659.11 |
Tier 2 | white | Female | 30-34 | 8.00 | 98810.08 |
Tier 3 | white | Female | 35-39 | 14.00 | 98416.89 |
Tier 2 | white | Male | 30-34 | 7.00 | 95209.71 |
Tier 3 | white | Male | 35-39 | 12.00 | 95000.00 |
Female | 60-64 | 5.00 | 94884.19 | ||
Male | 30-34 | 11.00 | 94000.00 | ||
Female | 50-54 | 7.00 | 93840.00 | ||
Tier 2 | person of color | Female | 25-29 | 11.00 | 93060.00 |
Tier 3 | person of color | Female | 35-39 | 6.00 | 92420.00 |
Tier 2 | white | Female | 25-29 | 10.00 | 91670.00 |
Tier 3 | person of color | Female | 30-34 | 10.00 | 90004.94 |
white | Female | 30-34 | 13.00 | 89280.00 | |
Tier 4 | white | Female | 30-34 | 5.00 | 84160.00 |
Tier 3 | person of color | Female | 25-29 | 18.00 | 80785.00 |
Tier 2 | white | Male | 25-29 | 7.00 | 80780.00 |
Tier 3 | white | Female | 25-29 | 17.00 | 79060.00 |
Male | 25-29 | 5.00 | 78060.00 | ||
Tier 4 | white | Female | 25-29 | 7.00 | 73440.00 |
current_news_median_job_salaried = news_salaried.groupby(['job_profile_current']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_salaried)
count_nonzero | median | |
---|---|---|
job_profile_current | ||
300313 - Columnist - Editorial | 13 | 225000.00 |
300113 - Columnist | 14 | 182044.43 |
320113 - Critic | 9 | 155522.35 |
330113 - Editorial Writer | 7 | 138191.93 |
280212 - Staff Writer | 351 | 129690.00 |
390510 - Graphics Editor | 6 | 118747.00 |
360114 - Photographer | 18 | 112837.62 |
370301 - Librarian | 5 | 110780.00 |
126902 - Topic Editor | 6 | 110581.73 |
392210 - Multiplatform Editor - Editorial | 5 | 105215.73 |
280226 - Video Journalist | 19 | 104000.00 |
390610 - Graphics Reporter | 17 | 100000.00 |
120202 - Assistant Editor | 29 | 91060.00 |
390110 - Multiplatform Editor | 63 | 87485.60 |
280228 - Designer | 40 | 85030.00 |
126202 - Photo Editor | 8 | 84021.88 |
390410 - Digital Video Editor | 23 | 82000.00 |
120302 - Assistant Editor - Editorial | 7 | 80060.00 |
current_news_median_job_hourly = news_hourly.groupby(['job_profile_current']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_hourly)
count_nonzero | median | |
---|---|---|
job_profile_current | ||
397210 - Multiplatform Editor - Editorial (PT/PTOC) | 5 | 51.28 |
581709 - Administrative Assistant | 6 | 38.34 |
280225 - Producer | 11 | 37.44 |
397110 - Multiplatform Editor (PT/PTOC) | 15 | 35.00 |
410251 - Editorial Aide | 12 | 23.02 |
430117 - News Aide | 10 | 17.23 |
current_news_median_job_gender_salaried = news_salaried.groupby(['job_profile_current','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_gender_salaried)
count_nonzero | median | ||
---|---|---|---|
job_profile_current | gender | ||
300313 - Columnist - Editorial | Male | 9 | 225000.00 |
300113 - Columnist | Female | 8 | 184898.40 |
Male | 6 | 182044.43 | |
320113 - Critic | Male | 6 | 158340.00 |
280212 - Staff Writer | Male | 184 | 136095.00 |
Female | 167 | 121910.00 | |
360114 - Photographer | Male | 13 | 114488.29 |
280226 - Video Journalist | Male | 8 | 107000.00 |
390610 - Graphics Reporter | Male | 6 | 100000.00 |
280226 - Video Journalist | Female | 11 | 100000.00 |
390610 - Graphics Reporter | Female | 10 | 99670.00 |
360114 - Photographer | Female | 5 | 98200.00 |
120202 - Assistant Editor | Male | 13 | 97340.00 |
280228 - Designer | Male | 14 | 93640.00 |
390110 - Multiplatform Editor | Male | 28 | 92295.14 |
120202 - Assistant Editor | Female | 16 | 90030.00 |
390410 - Digital Video Editor | Male | 8 | 85100.00 |
390110 - Multiplatform Editor | Female | 35 | 84220.51 |
280228 - Designer | Female | 26 | 80785.00 |
390410 - Digital Video Editor | Female | 15 | 79000.00 |
current_news_median_job_gender_hourly = news_hourly.groupby(['job_profile_current','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_gender_hourly)
count_nonzero | median | ||
---|---|---|---|
job_profile_current | gender | ||
581709 - Administrative Assistant | Female | 6 | 38.34 |
280225 - Producer | Female | 8 | 36.67 |
397110 - Multiplatform Editor (PT/PTOC) | Female | 5 | 35.00 |
Male | 10 | 34.79 | |
410251 - Editorial Aide | Female | 8 | 21.16 |
430117 - News Aide | Male | 7 | 17.30 |
current_news_median_job_race_salaried = news_salaried.groupby(['job_profile_current','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_race_salaried)
count_nonzero | median | ||
---|---|---|---|
job_profile_current | race_ethnicity | ||
300313 - Columnist - Editorial | White (United States of America) | 9 | 234560.00 |
300113 - Columnist | White (United States of America) | 9 | 187340.00 |
320113 - Critic | White (United States of America) | 8 | 153431.17 |
330113 - Editorial Writer | White (United States of America) | 6 | 142293.98 |
280212 - Staff Writer | White (United States of America) | 237 | 130189.42 |
Black or African American (United States of America) | 26 | 125350.00 | |
Asian (United States of America) | 31 | 124345.00 | |
280226 - Video Journalist | White (United States of America) | 12 | 114500.00 |
360114 - Photographer | White (United States of America) | 12 | 113790.63 |
280212 - Staff Writer | Hispanic or Latino (United States of America) | 17 | 105780.00 |
392210 - Multiplatform Editor - Editorial | White (United States of America) | 5 | 105215.73 |
280212 - Staff Writer | Prefer Not to Disclose (United States of America) | 5 | 102780.00 |
390610 - Graphics Reporter | White (United States of America) | 8 | 102390.00 |
120202 - Assistant Editor | White (United States of America) | 18 | 97170.00 |
280212 - Staff Writer | Two or More Races (United States of America) | 9 | 93780.00 |
280228 - Designer | Hispanic or Latino (United States of America) | 6 | 91254.94 |
390110 - Multiplatform Editor | White (United States of America) | 48 | 88662.80 |
126202 - Photo Editor | White (United States of America) | 6 | 88420.00 |
390110 - Multiplatform Editor | Black or African American (United States of America) | 6 | 87530.26 |
390410 - Digital Video Editor | White (United States of America) | 11 | 85000.00 |
280228 - Designer | White (United States of America) | 22 | 82130.76 |
Asian (United States of America) | 6 | 80785.00 | |
390410 - Digital Video Editor | Hispanic or Latino (United States of America) | 5 | 78000.00 |
current_news_median_job_race_hourly = news_hourly.groupby(['job_profile_current','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_race_hourly)
count_nonzero | median | ||
---|---|---|---|
job_profile_current | race_ethnicity | ||
280225 - Producer | White (United States of America) | 6 | 38.23 |
397110 - Multiplatform Editor (PT/PTOC) | White (United States of America) | 12 | 34.79 |
410251 - Editorial Aide | White (United States of America) | 7 | 25.64 |
430117 - News Aide | White (United States of America) | 8 | 16.77 |
current_news_median_job_race_gender_salaried = news_salaried.groupby(['job_profile_current','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_race_gender_salaried)
count_nonzero | median | |||
---|---|---|---|---|
job_profile_current | race_ethnicity | gender | ||
300113 - Columnist | White (United States of America) | Female | 5 | 230020.51 |
300313 - Columnist - Editorial | White (United States of America) | Male | 6 | 229780.00 |
320113 - Critic | White (United States of America) | Male | 6 | 158340.00 |
280212 - Staff Writer | Hispanic or Latino (United States of America) | Male | 7 | 154710.00 |
White (United States of America) | Male | 135 | 139410.00 | |
Black or African American (United States of America) | Male | 17 | 129951.95 | |
Asian (United States of America) | Male | 10 | 124952.50 | |
White (United States of America) | Female | 102 | 124720.00 | |
Asian (United States of America) | Female | 21 | 121910.00 | |
360114 - Photographer | White (United States of America) | Male | 7 | 120816.68 |
280212 - Staff Writer | Black or African American (United States of America) | Female | 9 | 117924.49 |
280226 - Video Journalist | White (United States of America) | Female | 6 | 117890.00 |
Male | 6 | 112000.00 | ||
120202 - Assistant Editor | White (United States of America) | Male | 9 | 100056.00 |
360114 - Photographer | White (United States of America) | Female | 5 | 98200.00 |
280212 - Staff Writer | Hispanic or Latino (United States of America) | Female | 10 | 97780.00 |
390110 - Multiplatform Editor | White (United States of America) | Male | 21 | 94211.74 |
280212 - Staff Writer | Two or More Races (United States of America) | Female | 9 | 93780.00 |
280228 - Designer | White (United States of America) | Male | 6 | 92500.00 |
120202 - Assistant Editor | White (United States of America) | Female | 9 | 90060.00 |
390410 - Digital Video Editor | White (United States of America) | Female | 5 | 85000.00 |
390110 - Multiplatform Editor | White (United States of America) | Female | 27 | 84340.00 |
390410 - Digital Video Editor | White (United States of America) | Male | 6 | 81250.00 |
280228 - Designer | White (United States of America) | Female | 16 | 78140.47 |
current_news_median_job_race_gender_hourly = news_hourly.groupby(['job_profile_current','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_race_gender_hourly)
count_nonzero | median | |||
---|---|---|---|---|
job_profile_current | race_ethnicity | gender | ||
280225 - Producer | White (United States of America) | Female | 5 | 38.46 |
397110 - Multiplatform Editor (PT/PTOC) | White (United States of America) | Male | 10 | 34.79 |
410251 - Editorial Aide | White (United States of America) | Female | 5 | 25.64 |
430117 - News Aide | White (United States of America) | Male | 6 | 16.85 |
current_news_median_job_race_group_gender_salaried = news_salaried.groupby(['desk','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_race_group_gender_salaried)
count_nonzero | median | |||
---|---|---|---|---|
desk | race_grouping | gender | ||
National | white | Male | 47 | 186065.00 |
Sports | white | Female | 5 | 160089.98 |
Editorial | white | Male | 15 | 158673.36 |
non-newsroom | white | Male | 9 | 151840.00 |
National | white | Female | 36 | 148550.00 |
Foreign | unknown | Male | 12 | 147670.00 |
Financial | white | Male | 23 | 142027.50 |
person of color | Female | 6 | 141070.00 | |
National | person of color | Female | 16 | 137890.00 |
Foreign | unknown | Female | 13 | 136560.00 |
National | person of color | Male | 12 | 135050.00 |
Financial | white | Female | 10 | 129876.00 |
Local | person of color | Male | 7 | 122909.15 |
white | Male | 25 | 121113.20 | |
Photography | white | Male | 9 | 119340.00 |
Graphics | white | Male | 5 | 117131.00 |
Style | white | Male | 18 | 116175.06 |
Female | 20 | 111833.47 | ||
Sports | person of color | Male | 9 | 110560.00 |
Local | white | Female | 24 | 108075.80 |
Editorial | white | Female | 10 | 107200.36 |
Local | person of color | Female | 12 | 106890.00 |
Graphics | white | Female | 5 | 104780.00 |
person of color | Female | 5 | 104320.14 | |
Audience Development and Engagement | white | Male | 7 | 103060.00 |
Sports | white | Male | 24 | 102653.66 |
Multiplatform | white | Male | 13 | 96600.08 |
Style | person of color | Female | 8 | 96353.74 |
Sports | person of color | Female | 5 | 95000.00 |
Video | white | Female | 13 | 95000.00 |
Photography | person of color | Male | 7 | 95000.00 |
Video | white | Male | 14 | 94500.00 |
Design | person of color | Male | 7 | 92500.00 |
white | Male | 6 | 92500.00 | |
Photography | white | Female | 10 | 91847.57 |
Audience Development and Engagement | person of color | Female | 9 | 90000.00 |
Multiplatform | white | Female | 23 | 89840.00 |
Audience Development and Engagement | white | Female | 10 | 86330.00 |
Design | person of color | Female | 8 | 85789.94 |
Multiplatform | person of color | Female | 5 | 84220.51 |
Video | person of color | Female | 15 | 84000.00 |
Design | white | Female | 11 | 82201.52 |
Emerging News Products | white | Female | 14 | 72940.00 |
current_news_median_job_race_group_gender_hourly = news_hourly.groupby(['job_profile_current','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_race_group_gender_hourly)
count_nonzero | median | |||
---|---|---|---|---|
job_profile_current | race_grouping | gender | ||
280225 - Producer | white | Female | 5 | 38.46 |
397110 - Multiplatform Editor (PT/PTOC) | white | Male | 10 | 34.79 |
410251 - Editorial Aide | white | Female | 5 | 25.64 |
430117 - News Aide | white | Male | 6 | 16.85 |
current_news_median_job_race_gender_age5_salaried = news_salaried.groupby(['job_profile_current','race_ethnicity','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_race_gender_age5_salaried)
count_nonzero | median | ||||
---|---|---|---|---|---|
job_profile_current | race_ethnicity | gender | age_group_5 | ||
280212 - Staff Writer | White (United States of America) | Male | 60-64 | 16.00 | 167440.92 |
Female | 65+ | 5.00 | 162355.42 | ||
60-64 | 7.00 | 160089.98 | |||
45-49 | 10.00 | 157910.00 | |||
Male | 35-39 | 29.00 | 151780.00 | ||
50-54 | 17.00 | 144410.00 | |||
40-44 | 21.00 | 144340.00 | |||
Female | 55-59 | 7.00 | 144190.00 | ||
Male | 55-59 | 10.00 | 134154.41 | ||
45-49 | 12.00 | 133814.40 | |||
Black or African American (United States of America) | Male | 35-39 | 5.00 | 132910.00 | |
White (United States of America) | Female | 50-54 | 12.00 | 129625.00 | |
Male | 65+ | 8.00 | 122430.55 | ||
Female | 30-34 | 10.00 | 121280.00 | ||
35-39 | 20.00 | 117690.00 | |||
40-44 | 11.00 | 115780.00 | |||
Asian (United States of America) | Female | 35-39 | 6.00 | 113060.00 | |
White (United States of America) | Male | 30-34 | 13.00 | 104690.00 | |
390110 - Multiplatform Editor | White (United States of America) | Male | 50-54 | 5.00 | 98150.00 |
280212 - Staff Writer | White (United States of America) | Male | 25-29 | 8.00 | 90670.00 |
Female | 25-29 | 17.00 | 90000.00 | ||
280228 - Designer | White (United States of America) | Female | 25-29 | 6.00 | 77750.00 |
390110 - Multiplatform Editor | White (United States of America) | Female | 25-29 | 6.00 | 72940.00 |
current_news_median_job_race_gender_age5_hourly = news_hourly.groupby(['job_profile_current','race_ethnicity','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_race_gender_age5_hourly)
count_nonzero | median | ||||
---|---|---|---|---|---|
job_profile_current | race_ethnicity | gender | age_group_5 |
current_news_median_job_race_group_gender_age5_salaried = news_salaried.groupby(['job_profile_current','race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_race_group_gender_age5_salaried)
count_nonzero | median | ||||
---|---|---|---|---|---|
job_profile_current | race_grouping | gender | age_group_5 | ||
280212 - Staff Writer | white | Male | 60-64 | 16.00 | 167440.92 |
Female | 65+ | 5.00 | 162355.42 | ||
60-64 | 7.00 | 160089.98 | |||
45-49 | 10.00 | 157910.00 | |||
Male | 35-39 | 29.00 | 151780.00 | ||
50-54 | 17.00 | 144410.00 | |||
40-44 | 21.00 | 144340.00 | |||
Female | 55-59 | 7.00 | 144190.00 | ||
unknown | Female | 35-39 | 5.00 | 136560.00 | |
white | Male | 55-59 | 10.00 | 134154.41 | |
45-49 | 12.00 | 133814.40 | |||
unknown | Male | 35-39 | 5.00 | 133560.00 | |
person of color | Male | 35-39 | 9.00 | 132910.00 | |
50-54 | 8.00 | 132700.35 | |||
40-44 | 6.00 | 131735.00 | |||
Female | 40-44 | 9.00 | 131090.00 | ||
white | Female | 50-54 | 12.00 | 129625.00 | |
unknown | Female | 30-34 | 5.00 | 128780.00 | |
person of color | Female | 30-34 | 8.00 | 127845.00 | |
white | Male | 65+ | 8.00 | 122430.55 | |
Female | 30-34 | 10.00 | 121280.00 | ||
35-39 | 20.00 | 117690.00 | |||
40-44 | 11.00 | 115780.00 | |||
Male | 30-34 | 13.00 | 104690.00 | ||
390110 - Multiplatform Editor | white | Male | 50-54 | 5.00 | 98150.00 |
280212 - Staff Writer | person of color | Female | 35-39 | 13.00 | 94780.00 |
25-29 | 9.00 | 93060.00 | |||
white | Male | 25-29 | 8.00 | 90670.00 | |
Female | 25-29 | 17.00 | 90000.00 | ||
280228 - Designer | person of color | Female | 25-29 | 6.00 | 80785.00 |
390410 - Digital Video Editor | person of color | Female | 25-29 | 6.00 | 78500.00 |
280228 - Designer | white | Female | 25-29 | 6.00 | 77750.00 |
390110 - Multiplatform Editor | white | Female | 25-29 | 6.00 | 72940.00 |
current_news_median_job_race_group_gender_age5_hourly = news_hourly.groupby(['job_profile_current','race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_news_median_job_race_group_gender_age5_hourly)
count_nonzero | median | ||||
---|---|---|---|---|---|
job_profile_current | race_grouping | gender | age_group_5 |
news_ratings = ratings_combined[ratings_combined['dept'] == 'News']
news_ratings_gender = news_ratings.groupby(['gender']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress_median(news_ratings_gender)
count_nonzero | median | |
---|---|---|
gender | ||
Male | 5772 | 3.50 |
Female | 6539 | 3.40 |
Prefer not to disclose | 26 | 3.20 |
news_ratings_race = news_ratings.groupby(['race_ethnicity']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress_median(news_ratings_race)
count_nonzero | median | |
---|---|---|
race_ethnicity | ||
American Indian or Alaska Native (United States of America) | 39 | 3.50 |
White (United States of America) | 8138 | 3.50 |
Asian (United States of America) | 1222 | 3.40 |
Hispanic or Latino (United States of America) | 715 | 3.40 |
Prefer Not to Disclose (United States of America) | 182 | 3.40 |
Black or African American (United States of America) | 1287 | 3.30 |
Native Hawaiian or Other Pacific Islander (United States of America) | 13 | 3.30 |
Two or More Races (United States of America) | 338 | 3.30 |
news_ratings_race_gender = news_ratings.groupby(['race_ethnicity','gender']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress(news_ratings_race_gender)
count_nonzero | median | ||
---|---|---|---|
race_ethnicity | gender | ||
American Indian or Alaska Native (United States of America) | Female | 26 | 3.60 |
Male | 13 | 3.20 | |
Asian (United States of America) | Female | 884 | 3.40 |
Male | 338 | 3.40 | |
Black or African American (United States of America) | Female | 676 | 3.30 |
Male | 611 | 3.30 | |
Hispanic or Latino (United States of America) | Female | 351 | 3.40 |
Male | 364 | 3.40 | |
Native Hawaiian or Other Pacific Islander (United States of America) | Male | 13 | 3.30 |
Prefer Not to Disclose (United States of America) | Female | 78 | 3.60 |
Male | 104 | 3.40 | |
Two or More Races (United States of America) | Female | 247 | 3.30 |
Male | 91 | 3.20 | |
White (United States of America) | Female | 4043 | 3.50 |
Male | 4069 | 3.50 | |
Prefer not to disclose | 26 | 3.20 |
news_ratings_race_gender_under3 = news_ratings[news_ratings['performance_rating'] < 3.1].groupby(['race_grouping','gender']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress(news_ratings_race_gender_under3)
count_nonzero | median | ||
---|---|---|---|
race_grouping | gender | ||
person of color | Female | 109 | 3.00 |
Male | 112 | 3.00 | |
unknown | Female | 5 | 3.00 |
Male | 9 | 2.90 | |
white | Female | 195 | 3.00 |
Male | 169 | 3.00 |
news_ratings_race_gender_over4 = news_ratings[news_ratings['performance_rating'] > 3.9].groupby(['race_grouping','gender']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress(news_ratings_race_gender_over4)
count_nonzero | median | ||
---|---|---|---|
race_grouping | gender | ||
person of color | Female | 50 | 4.10 |
Male | 22 | 4.10 | |
unknown | Female | 8 | 4.10 |
Male | 26 | 4.10 | |
white | Female | 205 | 4.10 |
Male | 320 | 4.20 |
news_change = reason_for_change_combined[reason_for_change_combined['dept'] == 'News']
news_change_gender = news_change.groupby(['business_process_reason','gender']).agg({'business_process_reason': [np.count_nonzero]})
suppress_count(news_change_gender)
count_nonzero | ||
---|---|---|
business_process_reason | gender | |
Request Compensation Change > Adjustment > Contract Increase | Female | 1449 |
Male | 1445 | |
Merit > Performance > Annual Performance Appraisal | Male | 1029 |
Female | 962 | |
Request Compensation Change > Adjustment > Change Plan Assignment | Female | 440 |
Male | 401 | |
Data Change > Data Change > Change Job Details | Female | 346 |
Male | 308 | |
Request Compensation Change > Adjustment > Market Adjustment | Female | 292 |
Transfer > Transfer > Move to another manager | Male | 203 |
Request Compensation Change > Adjustment > Market Adjustment | Male | 191 |
Promotion > Promotion > Promotion | Female | 148 |
Hire Employee > New Hire > Fill Vacancy | Female | 115 |
Transfer > Transfer > Move to another manager | Female | 113 |
Promotion > Promotion > Promotion | Male | 101 |
Hire Employee > New Hire > New Position | Female | 99 |
Hire Employee > New Hire > Fill Vacancy | Male | 77 |
Hire Employee > New Hire > New Position | Male | 72 |
Request Compensation Change > Adjustment > Increased Job Responsibilities | Male | 30 |
Request Compensation Change > Adjustment > Job Change | Female | 28 |
Male | 24 | |
Transfer > Transfer > Transfer between departments | Male | 24 |
Request Compensation Change > Adjustment > Increased Job Responsibilities | Female | 21 |
Transfer > Transfer > Transfer between departments | Female | 21 |
Hire Employee > New Hire > Convert Contingent | Female | 19 |
Lateral Move > Lateral Move > Move to Another Position | Male | 19 |
Request Compensation Change > Adjustment > Performance | Male | 15 |
Data Change > Data Change > Change Job Profile | Male | 12 |
Female | 12 | |
Request Compensation Change > Adjustment > Performance | Female | 11 |
Hire Employee > Rehire > Fill Vacancy | Male | 9 |
Female | 8 | |
Hire Employee > New Hire > Convert Contingent | Male | 7 |
Hire Employee > Rehire > New Position | Female | 6 |
Request Compensation Change > Adjustment > Contract Increase | Prefer not to disclose | 5 |
Lateral Move > Lateral Move > Move to Another Position | Female | 5 |
Transfer > Transfer > Transfer between companies | Male | 5 |
news_change_race = news_change.groupby(['business_process_reason','race_ethnicity']).agg({'business_process_reason': [np.count_nonzero]})
suppress_count(news_change_race)
count_nonzero | ||
---|---|---|
business_process_reason | race_ethnicity | |
Request Compensation Change > Adjustment > Contract Increase | White (United States of America) | 2057 |
Merit > Performance > Annual Performance Appraisal | White (United States of America) | 1457 |
Request Compensation Change > Adjustment > Change Plan Assignment | White (United States of America) | 565 |
Data Change > Data Change > Change Job Details | White (United States of America) | 448 |
Request Compensation Change > Adjustment > Market Adjustment | White (United States of America) | 309 |
... | ... | ... |
Data Change > Data Change > Change Job Profile | Black or African American (United States of America) | 5 |
Request Compensation Change > Adjustment > Contract Increase | Native Hawaiian or Other Pacific Islander (United States of America) | 5 |
Request Compensation Change > Adjustment > Job Change | Black or African American (United States of America) | 5 |
Asian (United States of America) | 5 | |
Data Change > Data Change > Change Job Details | Prefer Not to Disclose (United States of America) | 5 |
66 rows × 1 columns
reason_for_change_combined['merit_raises'] = reason_for_change_combined['business_process_reason'].str.contains('Merit', re.IGNORECASE)
twenty14 = np.datetime64('2016-04-01')
twenty15 = np.datetime64('2017-04-01')
twenty16 = np.datetime64('2018-04-01')
twenty17 = np.datetime64('2019-04-01')
twenty18 = np.datetime64('2020-04-01')
def raise_time(row):
if row['effective_date'] < twenty14:
return 'before 2015'
if row['effective_date'] < twenty15:
return '2015'
if row['effective_date'] < twenty16:
return '2016'
if row['effective_date'] < twenty17:
return '2017'
if row['effective_date'] < twenty18:
return '2018'
return 'unknown'
reason_for_change_combined['raise_after'] = reason_for_change_combined.apply(lambda row: raise_time(row), axis=1)
merit_raises_news_gender_salaried = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried')].groupby(['gender']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress(merit_raises_news_gender_salaried)
count_nonzero | median | |
---|---|---|
gender | ||
Female | 742 | 3000.00 |
Male | 843 | 3000.00 |
merit_raises_news_gender_hourly = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Hourly')].groupby(['gender']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress(merit_raises_news_gender_hourly)
count_nonzero | median | |
---|---|---|
gender | ||
Female | 103 | 1.04 |
Male | 60 | 1.03 |
merit_raises_news_race_salaried = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried')].groupby(['race_ethnicity']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_news_race_salaried)
count_nonzero | median | |
---|---|---|
race_ethnicity | ||
American Indian or Alaska Native (United States of America) | 7 | 3500.00 |
Asian (United States of America) | 116 | 3000.00 |
White (United States of America) | 1185 | 3000.00 |
Two or More Races (United States of America) | 16 | 2850.00 |
Black or African American (United States of America) | 142 | 2844.00 |
Prefer Not to Disclose (United States of America) | 12 | 2610.00 |
Hispanic or Latino (United States of America) | 77 | 2500.00 |
merit_raises_news_race_hourly = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Hourly')].groupby(['race_ethnicity']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_news_race_hourly)
count_nonzero | median | |
---|---|---|
race_ethnicity | ||
White (United States of America) | 118 | 1.10 |
Asian (United States of America) | 21 | 1.06 |
Black or African American (United States of America) | 18 | 1.02 |
merit_raises_news_race_group_salaried = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried')].groupby(['race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_news_race_group_salaried)
count_nonzero | median | |
---|---|---|
race_grouping | ||
person of color | 360 | 3000.00 |
unknown | 42 | 3000.00 |
white | 1185 | 3000.00 |
merit_raises_news_race_group_hourly = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Hourly')].groupby(['race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_news_race_group_hourly)
count_nonzero | median | |
---|---|---|
race_grouping | ||
white | 118 | 1.10 |
person of color | 45 | 1.03 |
merit_raises_news_gender_race_group_salaried = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_news_gender_race_group_salaried)
count_nonzero | median | ||
---|---|---|---|
gender | race_grouping | ||
Male | unknown | 22 | 3750.00 |
Female | person of color | 198 | 3000.00 |
unknown | 20 | 3000.00 | |
white | 524 | 3000.00 | |
Male | person of color | 162 | 3000.00 |
white | 659 | 3000.00 |
merit_raises_news_gender_race_group_hourly = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Hourly')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_news_gender_race_group_hourly)
count_nonzero | median | ||
---|---|---|---|
gender | race_grouping | ||
Male | white | 42 | 1.12 |
Female | white | 76 | 1.10 |
person of color | 27 | 1.03 | |
Male | person of color | 18 | 1.03 |
fifteen_raises_amount = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2015')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]},{'2015_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(fifteen_raises_amount)
count_nonzero | median | ||
---|---|---|---|
gender | race_grouping | ||
Female | person of color | 16 | 2881.50 |
white | 43 | 2500.00 | |
Male | person of color | 10 | 2162.50 |
white | 62 | 2848.50 |
fifteen_raises_score = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2015')].groupby(['gender','race_grouping']).agg({'2015_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(fifteen_raises_score)
count_nonzero | median | ||
---|---|---|---|
gender | race_grouping | ||
Female | person of color | 16 | 3.40 |
white | 43 | 3.70 | |
Male | person of color | 10 | 3.50 |
white | 62 | 3.60 |
sixteen_raises_amount = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2016')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]},{'2016_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(sixteen_raises_amount)
count_nonzero | median | ||
---|---|---|---|
gender | race_grouping | ||
Female | person of color | 24 | 2750.00 |
white | 57 | 3000.00 | |
Male | person of color | 16 | 2900.00 |
white | 82 | 3000.00 |
sixteen_raises_score = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2016')].groupby(['gender','race_grouping']).agg({'2016_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(sixteen_raises_score)
count_nonzero | median | ||
---|---|---|---|
gender | race_grouping | ||
Female | person of color | 24 | 3.40 |
white | 57 | 3.50 | |
Male | person of color | 16 | 3.40 |
white | 82 | 3.60 |
seventeen_raises_amount = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2017')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]},{'2017_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(seventeen_raises_amount)
count_nonzero | median | ||
---|---|---|---|
gender | race_grouping | ||
Female | person of color | 23 | 3000.00 |
white | 56 | 2500.00 | |
Male | person of color | 25 | 3000.00 |
white | 79 | 3000.00 |
seventeen_raises_score = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2017')].groupby(['gender','race_grouping']).agg({'2017_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(seventeen_raises_score)
count_nonzero | median | ||
---|---|---|---|
gender | race_grouping | ||
Female | person of color | 23 | 3.45 |
white | 56 | 3.40 | |
Male | person of color | 25 | 3.40 |
white | 79 | 3.60 |
eighteen_raises_amount = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2018')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]},{'2018_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(eighteen_raises_amount)
count_nonzero | median | ||
---|---|---|---|
gender | race_grouping | ||
Female | person of color | 25 | 3000.00 |
white | 94 | 3000.00 | |
Male | person of color | 25 | 2500.00 |
white | 110 | 3000.00 |
eighteen_raises_score = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'News') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2018')].groupby(['gender','race_grouping']).agg({'2018_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(eighteen_raises_score)
count_nonzero | median | ||
---|---|---|---|
gender | race_grouping | ||
Female | person of color | 25 | 3.40 |
white | 94 | 3.50 | |
Male | person of color | 25 | 3.40 |
white | 110 | 3.60 |
merit_raises_15 = reason_for_change_combined[(reason_for_change_combined['raise_after'] == '2015') & (reason_for_change_combined['merit_raises'] == True)]
merit_raises_16 = reason_for_change_combined[(reason_for_change_combined['raise_after'] == '2016') & (reason_for_change_combined['merit_raises'] == True)]
merit_raises_17 = reason_for_change_combined[(reason_for_change_combined['raise_after'] == '2017') & (reason_for_change_combined['merit_raises'] == True)]
merit_raises_18 = reason_for_change_combined[(reason_for_change_combined['raise_after'] == '2018') & (reason_for_change_combined['merit_raises'] == True)]
merit_raises_15 = merit_raises_15[['base_pay_change','pay_rate_type','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating']].rename(columns={'2015_annual_performance_rating':'performance_rating'})
merit_raises_16 = merit_raises_16[['base_pay_change','pay_rate_type','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2016_annual_performance_rating']].rename(columns={'2016_annual_performance_rating':'performance_rating'})
merit_raises_17 = merit_raises_17[['base_pay_change','pay_rate_type','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2017_annual_performance_rating']].rename(columns={'2017_annual_performance_rating':'performance_rating'})
merit_raises_18 = merit_raises_18[['base_pay_change','pay_rate_type','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2018_annual_performance_rating']].rename(columns={'2018_annual_performance_rating':'performance_rating'})
merit_raises_15 = pd.DataFrame(merit_raises_15)
merit_raises_16 = pd.DataFrame(merit_raises_16)
merit_raises_17 = pd.DataFrame(merit_raises_17)
merit_raises_18 = pd.DataFrame(merit_raises_18)
merit_raises_combined = pd.concat([merit_raises_15,merit_raises_16,merit_raises_17,merit_raises_18])
news_salaried_raises = merit_raises_combined[(merit_raises_combined['pay_rate_type'] == 'Salaried') & (merit_raises_combined['dept'] == 'News')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress(news_salaried_raises)
count_nonzero | median | ||
---|---|---|---|
gender | race_grouping | ||
Female | person of color | 88 | 3000.00 |
unknown | 8 | 2860.00 | |
white | 250 | 2961.16 | |
Male | person of color | 76 | 2500.00 |
unknown | 6 | 3250.00 | |
white | 333 | 3000.00 |
news_salaried_raises_scores = merit_raises_combined[(merit_raises_combined['pay_rate_type'] == 'Salaried') & (merit_raises_combined['dept'] == 'News')].groupby(['gender','race_grouping']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress(news_salaried_raises_scores)
count_nonzero | median | ||
---|---|---|---|
gender | race_grouping | ||
Female | person of color | 88 | 3.40 |
unknown | 8 | 4.00 | |
white | 250 | 3.50 | |
Male | person of color | 76 | 3.40 |
unknown | 6 | 3.75 | |
white | 333 | 3.60 |
news_hourly_raises = merit_raises_combined[(merit_raises_combined['pay_rate_type'] == 'Hourly') & (merit_raises_combined['dept'] == 'News')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress(news_hourly_raises)
count_nonzero | median | ||
---|---|---|---|
gender | race_grouping | ||
Female | person of color | 15 | 1.26 |
white | 51 | 1.40 | |
Male | person of color | 13 | 1.03 |
white | 26 | 1.02 |
news_hourly_raises_scores = merit_raises_combined[(merit_raises_combined['pay_rate_type'] == 'Hourly') & (merit_raises_combined['dept'] == 'News')].groupby(['gender','race_grouping']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress(news_hourly_raises_scores)
count_nonzero | median | ||
---|---|---|---|
gender | race_grouping | ||
Female | person of color | 15 | 3.40 |
white | 51 | 3.50 | |
Male | person of color | 13 | 3.30 |
white | 26 | 3.45 |
bezos = df[(df['hire_date'] > '2013-10-04') & (df['dept'] == 'News') & (df['pay_rate_type'] == 'Salaried')]
graham = df[(df['hire_date'] < '2013-10-05') & (df['dept'] == 'News') & (df['pay_rate_type'] == 'Salaried')]
bezos_gender = bezos.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(bezos_gender)
count_nonzero | median | |
---|---|---|
gender | ||
Male | 188 | 108395.00 |
Female | 230 | 94060.00 |
graham_gender = graham.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(graham_gender)
count_nonzero | median | |
---|---|---|
gender | ||
Male | 132 | 133375.00 |
Female | 106 | 117930.08 |
bezos_race = bezos.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(bezos_race)
count_nonzero | median | |
---|---|---|
race_ethnicity | ||
White (United States of America) | 253 | 104000.00 |
Black or African American (United States of America) | 39 | 100000.00 |
Prefer Not to Disclose (United States of America) | 12 | 94890.00 |
Hispanic or Latino (United States of America) | 35 | 94780.00 |
Asian (United States of America) | 44 | 94560.00 |
Two or More Races (United States of America) | 18 | 91090.00 |
graham_race = graham.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(graham_race)
count_nonzero | median | |
---|---|---|
race_ethnicity | ||
Hispanic or Latino (United States of America) | 7 | 134324.81 |
White (United States of America) | 184 | 128076.79 |
Asian (United States of America) | 15 | 118821.01 |
Black or African American (United States of America) | 21 | 114488.29 |
bezos_race_group = bezos.groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(bezos_race_group)
count_nonzero | median | |
---|---|---|
race_grouping | ||
unknown | 30 | 115920.00 |
white | 253 | 104000.00 |
person of color | 136 | 94530.00 |
graham_race_group = graham.groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(graham_race_group)
count_nonzero | median | |
---|---|---|
race_grouping | ||
unknown | 9 | 158340.00 |
white | 184 | 128076.79 |
person of color | 45 | 117924.49 |
bezos_gender_race_group = bezos.groupby(['race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(bezos_gender_race_group)
count_nonzero | median | ||
---|---|---|---|
race_grouping | gender | ||
unknown | Male | 12 | 128420.00 |
white | Male | 128 | 110455.00 |
unknown | Female | 18 | 103890.00 |
person of color | Male | 48 | 99217.50 |
white | Female | 124 | 94920.00 |
person of color | Female | 88 | 93030.00 |
graham_gender_race_group = graham.groupby(['race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(graham_gender_race_group)
count_nonzero | median | ||
---|---|---|---|
race_grouping | gender | ||
unknown | Male | 7 | 152730.88 |
white | Male | 101 | 137799.88 |
person of color | Male | 24 | 118698.72 |
Female | 21 | 117924.49 | |
white | Female | 83 | 117600.00 |
bezos_gender_race_group_age5 = bezos.groupby(['race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(bezos_gender_race_group_age5)
count_nonzero | median | |||
---|---|---|---|---|
race_grouping | gender | age_group_5 | ||
white | Female | 45-49 | 10.00 | 168735.00 |
Male | 60-64 | 6.00 | 165060.00 | |
55-59 | 6.00 | 157077.97 | ||
40-44 | 17.00 | 142780.00 | ||
unknown | Female | 35-39 | 5.00 | 136560.00 |
white | Male | 45-49 | 13.00 | 134806.25 |
person of color | Male | 40-44 | 10.00 | 131735.00 |
Female | 40-44 | 10.00 | 130545.00 | |
unknown | Male | 30-34 | 5.00 | 130000.00 |
white | Female | 50-54 | 7.00 | 129560.00 |
person of color | Male | 35-39 | 9.00 | 120000.00 |
white | Female | 35-39 | 19.00 | 116280.00 |
40-44 | 7.00 | 115780.00 | ||
Male | 35-39 | 32.00 | 113780.00 | |
50-54 | 8.00 | 110207.80 | ||
30-34 | 25.00 | 100056.00 | ||
person of color | Female | 35-39 | 16.00 | 97420.00 |
30-34 | 20.00 | 96030.00 | ||
unknown | Female | 30-34 | 8.00 | 92500.00 |
person of color | Male | 30-34 | 10.00 | 90500.00 |
white | Female | 30-34 | 28.00 | 90010.00 |
person of color | Female | 25-29 | 35.00 | 90000.00 |
Male | 25-29 | 11.00 | 80880.00 | |
white | Female | 25-29 | 41.00 | 80855.00 |
Male | 25-29 | 19.00 | 80560.00 | |
person of color | Female | <25 | 5.00 | 72500.00 |
white | Female | <25 | 8.00 | 66640.00 |
graham_gender_race_group_age5 = graham.groupby(['race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(graham_gender_race_group_age5)
count_nonzero | median | |||
---|---|---|---|---|
race_grouping | gender | age_group_5 | ||
white | Female | 65+ | 5.00 | 162355.42 |
Male | 35-39 | 11.00 | 157260.00 | |
Female | 60-64 | 13.00 | 143908.85 | |
Male | 50-54 | 20.00 | 142308.46 | |
60-64 | 20.00 | 141732.27 | ||
55-59 | 14.00 | 140697.96 | ||
40-44 | 12.00 | 137750.42 | ||
65+ | 12.00 | 134610.40 | ||
person of color | Male | 50-54 | 11.00 | 131075.90 |
white | Female | 55-59 | 16.00 | 119149.34 |
50-54 | 15.00 | 113280.00 | ||
45-49 | 8.00 | 110541.34 | ||
40-44 | 6.00 | 109890.54 | ||
Male | 45-49 | 8.00 | 105116.81 | |
person of color | Female | 50-54 | 5.00 | 104504.47 |
white | Female | 35-39 | 16.00 | 102552.82 |
bezos_gender_race_group_age5_tier = bezos.groupby(['race_grouping','gender','age_group_5','tier']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(bezos_gender_race_group_age5_tier)
count_nonzero | median | ||||
---|---|---|---|---|---|
race_grouping | gender | age_group_5 | tier | ||
white | Male | 45-49 | Tier 1 | 5.00 | 192065.00 |
40-44 | Tier 1 | 9.00 | 187910.00 | ||
Female | 45-49 | Tier 1 | 5.00 | 169560.00 | |
person of color | Female | 40-44 | Tier 1 | 6.00 | 153570.00 |
white | Male | 35-39 | Tier 1 | 10.00 | 152218.27 |
person of color | Male | 40-44 | Tier 2 | 6.00 | 145310.00 |
unknown | Female | 35-39 | Tier 1 | 5.00 | 136560.00 |
30-34 | Tier 1 | 5.00 | 128780.00 | ||
white | Female | 35-39 | Tier 1 | 8.00 | 125110.00 |
Male | 30-34 | Tier 1 | 8.00 | 121895.00 | |
45-49 | Tier 2 | 6.00 | 121810.00 | ||
Female | 25-29 | Tier 1 | 7.00 | 116780.00 | |
Male | 35-39 | Tier 2 | 10.00 | 110170.00 | |
person of color | Female | 35-39 | Tier 2 | 6.00 | 105060.00 |
30-34 | Tier 2 | 5.00 | 100780.00 | ||
white | Female | 35-39 | Tier 3 | 8.00 | 100356.89 |
30-34 | Tier 2 | 7.00 | 98280.16 | ||
Male | 30-34 | Tier 2 | 5.00 | 95780.00 | |
35-39 | Tier 3 | 11.00 | 95000.00 | ||
30-34 | Tier 3 | 11.00 | 94000.00 | ||
person of color | Female | 25-29 | Tier 2 | 11.00 | 93060.00 |
35-39 | Tier 3 | 6.00 | 92420.00 | ||
white | Female | 25-29 | Tier 2 | 10.00 | 91670.00 |
person of color | Female | 30-34 | Tier 3 | 9.00 | 90009.88 |
white | Female | 30-34 | Tier 3 | 13.00 | 89280.00 |
person of color | Female | 25-29 | Tier 3 | 18.00 | 80785.00 |
white | Male | 25-29 | Tier 2 | 7.00 | 80780.00 |
Female | 25-29 | Tier 3 | 17.00 | 79060.00 | |
Male | 25-29 | Tier 3 | 5.00 | 78060.00 | |
Female | 25-29 | Tier 4 | 7.00 | 73440.00 |
graham_gender_race_group_age5_tier = graham.groupby(['race_grouping','gender','age_group_5','tier']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(graham_gender_race_group_age5_tier)
count_nonzero | median | ||||
---|---|---|---|---|---|
race_grouping | gender | age_group_5 | tier | ||
white | Female | 50-54 | Tier 1 | 5.00 | 194690.00 |
Male | 35-39 | Tier 1 | 6.00 | 184065.00 | |
50-54 | Tier 1 | 6.00 | 182922.79 | ||
60-64 | Tier 1 | 6.00 | 182534.24 | ||
40-44 | Tier 1 | 6.00 | 159250.08 | ||
Female | 55-59 | Tier 1 | 6.00 | 153515.00 | |
Male | 55-59 | Tier 2 | 13.00 | 143276.51 | |
50-54 | Tier 2 | 10.00 | 133726.66 | ||
person of color | Male | 50-54 | Tier 2 | 7.00 | 131075.90 |
white | Male | 60-64 | Tier 2 | 7.00 | 127410.00 |
Tier 3 | 5.00 | 120816.68 | |||
65+ | Tier 2 | 6.00 | 116906.05 | ||
Female | 55-59 | Tier 3 | 5.00 | 111903.66 | |
Tier 2 | 5.00 | 110453.45 | |||
50-54 | Tier 2 | 6.00 | 110295.80 | ||
35-39 | Tier 2 | 9.00 | 105104.65 | ||
60-64 | Tier 3 | 5.00 | 94884.19 | ||
35-39 | Tier 3 | 6.00 | 94534.94 | ||
Male | 45-49 | Tier 2 | 5.00 | 91837.77 |
news_groups = news_salaried.groupby(['age_group_5','tier']).agg({'current_base_pay': [np.count_nonzero, np.median]})
expected_medians = pd.merge(news_salaried, news_groups, on=['age_group_5', 'tier'])
/tmp/ipykernel_8741/1001613220.py:2: FutureWarning: merging between different levels is deprecated and will be removed in a future version. (1 levels on the left,2 on the right) expected_medians = pd.merge(news_salaried, news_groups, on=['age_group_5', 'tier'])
below_expected_medians = expected_medians[expected_medians['current_base_pay'] < expected_medians[('current_base_pay', 'median')]].groupby(['race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(below_expected_medians)
count_nonzero | ||
---|---|---|
race_grouping | gender | |
person of color | Female | 51 |
Male | 33 | |
unknown | Female | 10 |
Male | 11 | |
white | Female | 108 |
Male | 98 |
above_expected_medians = expected_medians[expected_medians['current_base_pay'] > expected_medians[('current_base_pay', 'median')]].groupby(['race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(above_expected_medians)
count_nonzero | ||
---|---|---|
race_grouping | gender | |
person of color | Female | 48 |
Male | 35 | |
unknown | Female | 9 |
Male | 7 | |
white | Female | 93 |
Male | 122 |
expected_medians['disparity'] = expected_medians['current_base_pay'] - expected_medians[('current_base_pay', 'median')]
expected_medians['disparity_pct'] = (expected_medians['current_base_pay'] - expected_medians[('current_base_pay', 'median')])/expected_medians[('current_base_pay', 'median')]
disparity = expected_medians.groupby(['race_grouping','gender']).agg({'disparity': [np.count_nonzero, np.median]})
suppress(disparity)
count_nonzero | median | ||
---|---|---|---|
race_grouping | gender | ||
person of color | Female | 99 | 0.00 |
Male | 68 | 0.00 | |
unknown | Female | 19 | -587.50 |
Male | 18 | -955.00 | |
white | Female | 201 | -1175.00 |
Male | 220 | 3780.00 |
disparity_pct_above = expected_medians[expected_medians['disparity_pct'] > .05].groupby(['race_grouping','gender']).agg({'disparity': [np.count_nonzero, np.median]})
suppress(disparity_pct_above)
count_nonzero | median | ||
---|---|---|---|
race_grouping | gender | ||
person of color | Female | 32 | 14330.07 |
Male | 26 | 23107.50 | |
unknown | Female | 5 | 11825.00 |
Male | 6 | 16715.00 | |
white | Female | 74 | 18798.44 |
Male | 103 | 33500.00 |
disparity_pct_below = expected_medians[expected_medians['disparity_pct'] < -.05].groupby(['race_grouping','gender']).agg({'disparity': [np.count_nonzero, np.median]})
suppress(disparity_pct_below)
count_nonzero | median | ||
---|---|---|---|
race_grouping | gender | ||
person of color | Female | 36 | -10209.65 |
Male | 25 | -17205.06 | |
unknown | Female | 8 | -28562.50 |
Male | 8 | -13360.00 | |
white | Female | 79 | -13639.97 |
Male | 72 | -18719.56 |
expected_medians.groupby(['race_grouping','gender']).agg({'disparity_pct': [np.count_nonzero, np.average]})
disparity_pct | |||
---|---|---|---|
count_nonzero | average | ||
race_grouping | gender | ||
person of color | Female | 99 | 0.02 |
Male | 68 | 0.05 | |
unknown | Female | 19 | -0.05 |
Male | 18 | -0.01 | |
white | Female | 201 | 0.05 |
Male | 220 | 0.09 | |
Prefer not to disclose | 1 | 0.09 |
bezos_news_groups = bezos.groupby(['age_group_5','tier']).agg({'current_base_pay': [np.count_nonzero, np.median]})
bezos_expected_medians = pd.merge(bezos, bezos_news_groups, on=['age_group_5', 'tier'])
graham_news_groups = graham.groupby(['age_group_5','tier']).agg({'current_base_pay': [np.count_nonzero, np.median]})
graham_expected_medians = pd.merge(graham, graham_news_groups, on=['age_group_5', 'tier'])
/tmp/ipykernel_8741/3910450077.py:2: FutureWarning: merging between different levels is deprecated and will be removed in a future version. (1 levels on the left,2 on the right) bezos_expected_medians = pd.merge(bezos, bezos_news_groups, on=['age_group_5', 'tier']) /tmp/ipykernel_8741/3910450077.py:4: FutureWarning: merging between different levels is deprecated and will be removed in a future version. (1 levels on the left,2 on the right) graham_expected_medians = pd.merge(graham, graham_news_groups, on=['age_group_5', 'tier'])
bezos_expected_medians['disparity'] = bezos_expected_medians['current_base_pay'] - bezos_expected_medians[('current_base_pay', 'median')]
bezos_expected_medians['disparity_pct'] = (bezos_expected_medians['current_base_pay'] - bezos_expected_medians[('current_base_pay', 'median')])/bezos_expected_medians[('current_base_pay', 'median')]
graham_expected_medians['disparity'] = graham_expected_medians['current_base_pay'] - graham_expected_medians[('current_base_pay', 'median')]
graham_expected_medians['disparity_pct'] = (graham_expected_medians['current_base_pay'] - graham_expected_medians[('current_base_pay', 'median')])/graham_expected_medians[('current_base_pay', 'median')]
/tmp/ipykernel_8741/3618110045.py:1: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` bezos_expected_medians['disparity'] = bezos_expected_medians['current_base_pay'] - bezos_expected_medians[('current_base_pay', 'median')] /tmp/ipykernel_8741/3618110045.py:2: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` bezos_expected_medians['disparity_pct'] = (bezos_expected_medians['current_base_pay'] - bezos_expected_medians[('current_base_pay', 'median')])/bezos_expected_medians[('current_base_pay', 'median')] /tmp/ipykernel_8741/3618110045.py:3: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` graham_expected_medians['disparity'] = graham_expected_medians['current_base_pay'] - graham_expected_medians[('current_base_pay', 'median')] /tmp/ipykernel_8741/3618110045.py:4: PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()` graham_expected_medians['disparity_pct'] = (graham_expected_medians['current_base_pay'] - graham_expected_medians[('current_base_pay', 'median')])/graham_expected_medians[('current_base_pay', 'median')]
bezos_disparity_gender = bezos_expected_medians.groupby(['gender']).agg({'disparity_pct': [np.count_nonzero, np.average]})
suppress(bezos_disparity_gender)
count_nonzero | average | |
---|---|---|
gender | ||
Female | 212 | 0.04 |
Male | 171 | 0.08 |
bezos_disparity_race_group = bezos_expected_medians.groupby(['race_grouping']).agg({'disparity_pct': [np.count_nonzero, np.average]})
suppress(bezos_disparity_race_group)
count_nonzero | average | |
---|---|---|
race_grouping | ||
person of color | 124 | 0.04 |
unknown | 27 | -0.04 |
white | 233 | 0.07 |
bezos_disparity_gender_race_group = bezos_expected_medians.groupby(['race_grouping','gender']).agg({'disparity_pct': [np.count_nonzero, np.average]})
suppress(bezos_disparity_gender_race_group)
count_nonzero | average | ||
---|---|---|---|
race_grouping | gender | ||
person of color | Female | 79 | 0.03 |
Male | 45 | 0.07 | |
unknown | Female | 17 | -0.05 |
Male | 10 | -0.02 | |
white | Female | 116 | 0.05 |
Male | 116 | 0.09 |
graham_disparity_gender = graham_expected_medians.groupby(['gender']).agg({'disparity_pct': [np.count_nonzero, np.average]})
suppress(graham_disparity_gender)
count_nonzero | average | |
---|---|---|
gender | ||
Female | 99 | 0.03 |
Male | 127 | 0.07 |
graham_disparity_race_group = graham_expected_medians.groupby(['race_grouping']).agg({'disparity_pct': [np.count_nonzero, np.average]})
suppress(graham_disparity_race_group)
count_nonzero | average | |
---|---|---|
race_grouping | ||
person of color | 42 | 0.01 |
unknown | 8 | -0.03 |
white | 176 | 0.07 |
graham_disparity_gender_race_group = graham_expected_medians.groupby(['race_grouping','gender']).agg({'disparity_pct': [np.count_nonzero, np.average]})
suppress(graham_disparity_gender_race_group)
count_nonzero | average | ||
---|---|---|---|
race_grouping | gender | ||
person of color | Female | 19 | 0.01 |
Male | 23 | 0.00 | |
unknown | Male | 6 | -0.03 |
white | Female | 78 | 0.04 |
Male | 98 | 0.09 |
news_salaried_regression = news_salaried[['department','gender','race_ethnicity','current_base_pay','job_profile_current','cost_center_current','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','age','years_of_service','age_group_5','years_of_service_grouped','dept','desk','tier','race_grouping']]
news_salaried_regression = pd.get_dummies(news_salaried_regression, columns=['gender','race_ethnicity','age_group_5','years_of_service_grouped','dept','desk','tier','race_grouping'])
news_salaried_regression = news_salaried_regression.rename(columns={'race_grouping_person of color':'race_grouping_person_of_color','age_group_5_<25':'age_group_5_25_under','age_group_5_25-29':'age_group_5_25to29','age_group_5_30-34':'age_group_5_30to34','age_group_5_35-39':'age_group_5_35to39','age_group_5_40-44':'age_group_5_40to44','age_group_5_45-49':'age_group_5_45to49','age_group_5_50-54':'age_group_5_50to54','age_group_5_55-59':'age_group_5_55to59','age_group_5_60-64':'age_group_5_60to64','age_group_5_65+':'age_group_5_65_over','tier_Tier 1':'tier_Tier_1','tier_Tier 2':'tier_Tier_2','tier_Tier 3':'tier_Tier_3','tier_Tier 4':'tier_Tier_4','years_of_service_grouped_0':'years_of_service_grouped_0','years_of_service_grouped_1-2':'years_of_service_grouped_1to2','years_of_service_grouped_3-5':'years_of_service_grouped_3to5','years_of_service_grouped_6-10':'years_of_service_grouped_6to10','years_of_service_grouped_11-15':'years_of_service_grouped_11to15','years_of_service_grouped_16-20':'years_of_service_grouped_16to20','years_of_service_grouped_21-25':'years_of_service_grouped_21to25','years_of_service_grouped_25+':'years_of_service_grouped_25_over'})
import statsmodels.formula.api as sm
model1 = sm.ols(data=news_salaried_regression, formula = 'current_base_pay ~ gender_Female + gender_Male')
result1 = model1.fit()
result1.summary()
Dep. Variable: | current_base_pay | R-squared: | 0.036 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.033 |
Method: | Least Squares | F-statistic: | 12.13 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 6.74e-06 |
Time: | 20:31:52 | Log-Likelihood: | -7942.3 |
No. Observations: | 657 | AIC: | 1.589e+04 |
Df Residuals: | 654 | BIC: | 1.590e+04 |
Df Model: | 2 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 1.063e+05 | 4.31e+04 | 2.465 | 0.014 | 2.16e+04 | 1.91e+05 |
gender_Female | 7338.7585 | 4.32e+04 | 0.170 | 0.865 | -7.75e+04 | 9.22e+04 |
gender_Male | 2.389e+04 | 4.32e+04 | 0.553 | 0.580 | -6.09e+04 | 1.09e+05 |
Omnibus: | 179.942 | Durbin-Watson: | 1.702 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 429.949 |
Skew: | 1.429 | Prob(JB): | 4.34e-94 |
Kurtosis: | 5.746 | Cond. No. | 54.4 |
model2 = sm.ols(data=news_salaried_regression, formula = 'current_base_pay ~ race_grouping_white + race_grouping_person_of_color')
result2 = model2.fit()
result2.summary()
Dep. Variable: | current_base_pay | R-squared: | 0.027 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.024 |
Method: | Least Squares | F-statistic: | 8.957 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.000145 |
Time: | 20:31:52 | Log-Likelihood: | -7945.3 |
No. Observations: | 657 | AIC: | 1.590e+04 |
Df Residuals: | 654 | BIC: | 1.591e+04 |
Df Model: | 2 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 1.256e+05 | 6939.188 | 18.102 | 0.000 | 1.12e+05 | 1.39e+05 |
race_grouping_white | 574.5152 | 7242.215 | 0.079 | 0.937 | -1.36e+04 | 1.48e+04 |
race_grouping_person_of_color | -1.549e+04 | 7650.339 | -2.024 | 0.043 | -3.05e+04 | -464.535 |
Omnibus: | 167.823 | Durbin-Watson: | 1.711 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 374.307 |
Skew: | 1.366 | Prob(JB): | 5.25e-82 |
Kurtosis: | 5.492 | Cond. No. | 9.02 |
model3 = sm.ols(data=news_salaried_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result3 = model3.fit()
result3.summary()
Dep. Variable: | current_base_pay | R-squared: | 0.056 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.051 |
Method: | Least Squares | F-statistic: | 9.737 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 1.18e-07 |
Time: | 20:31:52 | Log-Likelihood: | -7935.2 |
No. Observations: | 657 | AIC: | 1.588e+04 |
Df Residuals: | 652 | BIC: | 1.590e+04 |
Df Model: | 4 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 1.063e+05 | 4.33e+04 | 2.453 | 0.014 | 2.12e+04 | 1.91e+05 |
gender_Female | 1.194e+04 | 4.28e+04 | 0.279 | 0.780 | -7.21e+04 | 9.6e+04 |
gender_Male | 2.708e+04 | 4.28e+04 | 0.633 | 0.527 | -5.7e+04 | 1.11e+05 |
race_grouping_white | 44.0015 | 7143.588 | 0.006 | 0.995 | -1.4e+04 | 1.41e+04 |
race_grouping_person_of_color | -1.413e+04 | 7550.212 | -1.872 | 0.062 | -2.9e+04 | 692.197 |
Omnibus: | 174.613 | Durbin-Watson: | 1.707 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 410.327 |
Skew: | 1.394 | Prob(JB): | 7.92e-90 |
Kurtosis: | 5.687 | Cond. No. | 63.6 |
new_news_salaried_regression = pd.DataFrame({'gender_Female': [1,0,1,0], 'gender_Male': [0,1,0,1], 'race_grouping_white': [1,1,0,0], 'race_grouping_person_of_color': [0,0,1,1], 'age': [40,40,40,40]})
new_news_salaried_regression['predicted'] = result3.predict(new_news_salaried_regression)
new_news_salaried_regression
gender_Female | gender_Male | race_grouping_white | race_grouping_person_of_color | age | predicted | |
---|---|---|---|---|---|---|
0 | 1 | 0 | 1 | 0 | 40 | 118280.62 |
1 | 0 | 1 | 1 | 0 | 40 | 133419.52 |
2 | 1 | 0 | 0 | 1 | 40 | 104103.15 |
3 | 0 | 1 | 0 | 1 | 40 | 119242.05 |
model4 = sm.ols(data=news_salaried_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result4 = model4.fit()
result4.summary()
Dep. Variable: | current_base_pay | R-squared: | 0.237 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.224 |
Method: | Least Squares | F-statistic: | 18.22 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 7.96e-32 |
Time: | 20:31:53 | Log-Likelihood: | -7865.4 |
No. Observations: | 657 | AIC: | 1.575e+04 |
Df Residuals: | 645 | BIC: | 1.581e+04 |
Df Model: | 11 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 1.119e+05 | 3.53e+04 | 3.170 | 0.002 | 4.26e+04 | 1.81e+05 |
gender_Female | -2099.5121 | 3.88e+04 | -0.054 | 0.957 | -7.84e+04 | 7.42e+04 |
gender_Male | 4711.0354 | 3.89e+04 | 0.121 | 0.904 | -7.16e+04 | 8.1e+04 |
age_group_5_25_under | -3.77e+04 | 9634.532 | -3.913 | 0.000 | -5.66e+04 | -1.88e+04 |
age_group_5_25to29 | -2.255e+04 | 5195.953 | -4.340 | 0.000 | -3.28e+04 | -1.23e+04 |
age_group_5_30to34 | -5545.8004 | 4968.360 | -1.116 | 0.265 | -1.53e+04 | 4210.313 |
age_group_5_35to39 | 1.177e+04 | 5092.608 | 2.312 | 0.021 | 1773.930 | 2.18e+04 |
age_group_5_40to44 | 2.259e+04 | 5736.192 | 3.938 | 0.000 | 1.13e+04 | 3.39e+04 |
age_group_5_45to49 | 2.294e+04 | 6289.504 | 3.648 | 0.000 | 1.06e+04 | 3.53e+04 |
age_group_5_50to54 | 2.326e+04 | 5759.257 | 4.039 | 0.000 | 1.2e+04 | 3.46e+04 |
age_group_5_55to59 | 2.182e+04 | 6449.835 | 3.383 | 0.001 | 9154.172 | 3.45e+04 |
age_group_5_60to64 | 3.825e+04 | 6368.979 | 6.005 | 0.000 | 2.57e+04 | 5.08e+04 |
age_group_5_65_over | 3.705e+04 | 9161.499 | 4.044 | 0.000 | 1.91e+04 | 5.5e+04 |
Omnibus: | 190.345 | Durbin-Watson: | 1.900 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 541.300 |
Skew: | 1.426 | Prob(JB): | 2.87e-118 |
Kurtosis: | 6.412 | Cond. No. | 1.50e+15 |
model5 = sm.ols(data=news_salaried_regression, formula = 'current_base_pay ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result5 = model5.fit()
result5.summary()
Dep. Variable: | current_base_pay | R-squared: | 0.238 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.225 |
Method: | Least Squares | F-statistic: | 18.35 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 4.61e-32 |
Time: | 20:31:53 | Log-Likelihood: | -7864.8 |
No. Observations: | 657 | AIC: | 1.575e+04 |
Df Residuals: | 645 | BIC: | 1.581e+04 |
Df Model: | 11 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 1.196e+05 | 5861.680 | 20.403 | 0.000 | 1.08e+05 | 1.31e+05 |
race_grouping_white | -5562.1439 | 6542.297 | -0.850 | 0.396 | -1.84e+04 | 7284.629 |
race_grouping_person_of_color | -1.303e+04 | 6928.255 | -1.881 | 0.060 | -2.66e+04 | 574.645 |
age_group_5_25_under | -3.741e+04 | 8954.320 | -4.178 | 0.000 | -5.5e+04 | -1.98e+04 |
age_group_5_25to29 | -2.199e+04 | 3810.252 | -5.772 | 0.000 | -2.95e+04 | -1.45e+04 |
age_group_5_30to34 | -5447.2684 | 3800.522 | -1.433 | 0.152 | -1.29e+04 | 2015.622 |
age_group_5_35to39 | 1.23e+04 | 3644.703 | 3.376 | 0.001 | 5146.772 | 1.95e+04 |
age_group_5_40to44 | 2.533e+04 | 4567.605 | 5.546 | 0.000 | 1.64e+04 | 3.43e+04 |
age_group_5_45to49 | 2.342e+04 | 5203.606 | 4.502 | 0.000 | 1.32e+04 | 3.36e+04 |
age_group_5_50to54 | 2.514e+04 | 4569.941 | 5.502 | 0.000 | 1.62e+04 | 3.41e+04 |
age_group_5_55to59 | 2.237e+04 | 5407.680 | 4.137 | 0.000 | 1.18e+04 | 3.3e+04 |
age_group_5_60to64 | 3.816e+04 | 5321.031 | 7.171 | 0.000 | 2.77e+04 | 4.86e+04 |
age_group_5_65_over | 3.772e+04 | 8497.451 | 4.439 | 0.000 | 2.1e+04 | 5.44e+04 |
Omnibus: | 189.869 | Durbin-Watson: | 1.897 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 531.171 |
Skew: | 1.430 | Prob(JB): | 4.55e-116 |
Kurtosis: | 6.351 | Cond. No. | 1.61e+15 |
model6 = sm.ols(data=news_salaried_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result6 = model6.fit()
result6.summary()
Dep. Variable: | current_base_pay | R-squared: | 0.243 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.228 |
Method: | Least Squares | F-statistic: | 15.89 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 1.24e-31 |
Time: | 20:31:53 | Log-Likelihood: | -7862.7 |
No. Observations: | 657 | AIC: | 1.575e+04 |
Df Residuals: | 643 | BIC: | 1.582e+04 |
Df Model: | 13 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 1.171e+05 | 3.58e+04 | 3.271 | 0.001 | 4.68e+04 | 1.87e+05 |
gender_Female | -529.3607 | 3.88e+04 | -0.014 | 0.989 | -7.67e+04 | 7.56e+04 |
gender_Male | 5773.4074 | 3.88e+04 | 0.149 | 0.882 | -7.04e+04 | 8.19e+04 |
race_grouping_white | -5564.1416 | 6533.784 | -0.852 | 0.395 | -1.84e+04 | 7265.991 |
race_grouping_person_of_color | -1.25e+04 | 6922.302 | -1.805 | 0.072 | -2.61e+04 | 1098.023 |
age_group_5_25_under | -3.578e+04 | 9663.473 | -3.703 | 0.000 | -5.48e+04 | -1.68e+04 |
age_group_5_25to29 | -2.099e+04 | 5237.642 | -4.007 | 0.000 | -3.13e+04 | -1.07e+04 |
age_group_5_30to34 | -5182.1108 | 4958.720 | -1.045 | 0.296 | -1.49e+04 | 4555.130 |
age_group_5_35to39 | 1.201e+04 | 5090.717 | 2.359 | 0.019 | 2013.977 | 2.2e+04 |
age_group_5_40to44 | 2.434e+04 | 5796.240 | 4.200 | 0.000 | 1.3e+04 | 3.57e+04 |
age_group_5_45to49 | 2.291e+04 | 6306.945 | 3.633 | 0.000 | 1.05e+04 | 3.53e+04 |
age_group_5_50to54 | 2.417e+04 | 5805.641 | 4.163 | 0.000 | 1.28e+04 | 3.56e+04 |
age_group_5_55to59 | 2.184e+04 | 6477.911 | 3.371 | 0.001 | 9116.357 | 3.46e+04 |
age_group_5_60to64 | 3.756e+04 | 6410.429 | 5.859 | 0.000 | 2.5e+04 | 5.01e+04 |
age_group_5_65_over | 3.621e+04 | 9238.693 | 3.919 | 0.000 | 1.81e+04 | 5.44e+04 |
Omnibus: | 190.682 | Durbin-Watson: | 1.890 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 541.397 |
Skew: | 1.429 | Prob(JB): | 2.74e-118 |
Kurtosis: | 6.407 | Cond. No. | 1.23e+15 |
model7 = sm.ols(data=news_salaried_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over + tier_Tier_1 + tier_Tier_2 + tier_Tier_3 + tier_Tier_4')
result7 = model7.fit()
result7.summary()
Dep. Variable: | current_base_pay | R-squared: | 0.445 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.430 |
Method: | Least Squares | F-statistic: | 30.13 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 2.33e-70 |
Time: | 20:31:53 | Log-Likelihood: | -7760.8 |
No. Observations: | 657 | AIC: | 1.556e+04 |
Df Residuals: | 639 | BIC: | 1.564e+04 |
Df Model: | 17 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 1.076e+05 | 3.17e+04 | 3.393 | 0.001 | 4.53e+04 | 1.7e+05 |
gender_Female | 3898.5269 | 3.34e+04 | 0.117 | 0.907 | -6.16e+04 | 6.94e+04 |
gender_Male | 7604.6247 | 3.34e+04 | 0.228 | 0.820 | -5.79e+04 | 7.32e+04 |
race_grouping_white | 1.53e+04 | 5815.295 | 2.630 | 0.009 | 3876.083 | 2.67e+04 |
race_grouping_person_of_color | 1.033e+04 | 6175.636 | 1.673 | 0.095 | -1795.658 | 2.25e+04 |
age_group_5_25_under | -2.7e+04 | 8441.069 | -3.199 | 0.001 | -4.36e+04 | -1.04e+04 |
age_group_5_25to29 | -1.282e+04 | 4616.125 | -2.778 | 0.006 | -2.19e+04 | -3757.902 |
age_group_5_30to34 | -1384.4169 | 4303.793 | -0.322 | 0.748 | -9835.703 | 7066.870 |
age_group_5_35to39 | 8654.0619 | 4451.474 | 1.944 | 0.052 | -87.224 | 1.74e+04 |
age_group_5_40to44 | 1.653e+04 | 5034.950 | 3.283 | 0.001 | 6643.969 | 2.64e+04 |
age_group_5_45to49 | 1.872e+04 | 5521.185 | 3.391 | 0.001 | 7879.387 | 2.96e+04 |
age_group_5_50to54 | 2.268e+04 | 5058.230 | 4.484 | 0.000 | 1.28e+04 | 3.26e+04 |
age_group_5_55to59 | 2.101e+04 | 5664.358 | 3.709 | 0.000 | 9886.779 | 3.21e+04 |
age_group_5_60to64 | 3.159e+04 | 5551.529 | 5.691 | 0.000 | 2.07e+04 | 4.25e+04 |
age_group_5_65_over | 2.963e+04 | 8036.845 | 3.687 | 0.000 | 1.39e+04 | 4.54e+04 |
tier_Tier_1 | 1.562e+04 | 8239.513 | 1.895 | 0.059 | -564.172 | 3.18e+04 |
tier_Tier_2 | -1.519e+04 | 8203.583 | -1.851 | 0.065 | -3.13e+04 | 921.935 |
tier_Tier_3 | -3.719e+04 | 8262.558 | -4.501 | 0.000 | -5.34e+04 | -2.1e+04 |
tier_Tier_4 | -4.003e+04 | 9950.755 | -4.022 | 0.000 | -5.96e+04 | -2.05e+04 |
Omnibus: | 274.112 | Durbin-Watson: | 1.895 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 1550.092 |
Skew: | 1.788 | Prob(JB): | 0.00 |
Kurtosis: | 9.620 | Cond. No. | 1.33e+15 |
model8 = sm.ols(data=news_salaried_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over + tier_Tier_1 + tier_Tier_2 + tier_Tier_3 + tier_Tier_4 + years_of_service_grouped_0 + years_of_service_grouped_1to2 + years_of_service_grouped_3to5 + years_of_service_grouped_6to10 + years_of_service_grouped_11to15 + years_of_service_grouped_16to20 + years_of_service_grouped_21to25 + years_of_service_grouped_25_over')
result8 = model8.fit()
result8.summary()
Dep. Variable: | current_base_pay | R-squared: | 0.449 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.428 |
Method: | Least Squares | F-statistic: | 21.42 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 2.98e-66 |
Time: | 20:31:53 | Log-Likelihood: | -7758.7 |
No. Observations: | 657 | AIC: | 1.557e+04 |
Df Residuals: | 632 | BIC: | 1.568e+04 |
Df Model: | 24 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 9.685e+04 | 2.87e+04 | 3.374 | 0.001 | 4.05e+04 | 1.53e+05 |
gender_Female | 3210.6982 | 3.35e+04 | 0.096 | 0.924 | -6.27e+04 | 6.91e+04 |
gender_Male | 6696.3049 | 3.36e+04 | 0.200 | 0.842 | -5.92e+04 | 7.26e+04 |
race_grouping_white | 1.546e+04 | 5884.428 | 2.627 | 0.009 | 3901.135 | 2.7e+04 |
race_grouping_person_of_color | 1.03e+04 | 6214.884 | 1.658 | 0.098 | -1899.931 | 2.25e+04 |
age_group_5_25_under | -3.108e+04 | 8815.438 | -3.526 | 0.000 | -4.84e+04 | -1.38e+04 |
age_group_5_25to29 | -1.684e+04 | 4828.850 | -3.488 | 0.001 | -2.63e+04 | -7358.232 |
age_group_5_30to34 | -4151.1843 | 4408.927 | -0.942 | 0.347 | -1.28e+04 | 4506.734 |
age_group_5_35to39 | 6480.4160 | 4467.519 | 1.451 | 0.147 | -2292.561 | 1.53e+04 |
age_group_5_40to44 | 1.457e+04 | 5025.312 | 2.899 | 0.004 | 4698.737 | 2.44e+04 |
age_group_5_45to49 | 1.816e+04 | 5414.644 | 3.353 | 0.001 | 7522.975 | 2.88e+04 |
age_group_5_50to54 | 2.363e+04 | 5056.807 | 4.673 | 0.000 | 1.37e+04 | 3.36e+04 |
age_group_5_55to59 | 2.221e+04 | 5663.572 | 3.922 | 0.000 | 1.11e+04 | 3.33e+04 |
age_group_5_60to64 | 3.27e+04 | 5769.701 | 5.667 | 0.000 | 2.14e+04 | 4.4e+04 |
age_group_5_65_over | 3.119e+04 | 8418.597 | 3.705 | 0.000 | 1.47e+04 | 4.77e+04 |
tier_Tier_1 | 1.567e+04 | 8302.573 | 1.888 | 0.059 | -629.085 | 3.2e+04 |
tier_Tier_2 | -1.486e+04 | 8275.321 | -1.795 | 0.073 | -3.11e+04 | 1392.889 |
tier_Tier_3 | -3.701e+04 | 8325.110 | -4.445 | 0.000 | -5.34e+04 | -2.07e+04 |
tier_Tier_4 | -3.97e+04 | 1e+04 | -3.962 | 0.000 | -5.94e+04 | -2e+04 |
years_of_service_grouped_0 | 1.802e+04 | 5905.767 | 3.052 | 0.002 | 6425.182 | 2.96e+04 |
years_of_service_grouped_1to2 | 1.304e+04 | 4635.235 | 2.813 | 0.005 | 3936.179 | 2.21e+04 |
years_of_service_grouped_3to5 | 1.64e+04 | 4730.430 | 3.468 | 0.001 | 7114.931 | 2.57e+04 |
years_of_service_grouped_6to10 | 1.061e+04 | 4823.055 | 2.199 | 0.028 | 1134.325 | 2.01e+04 |
years_of_service_grouped_11to15 | 1.288e+04 | 5958.755 | 2.162 | 0.031 | 1181.946 | 2.46e+04 |
years_of_service_grouped_16to20 | 7298.5208 | 5859.005 | 1.246 | 0.213 | -4206.951 | 1.88e+04 |
years_of_service_grouped_21to25 | 8241.3903 | 5964.166 | 1.382 | 0.168 | -3470.589 | 2e+04 |
years_of_service_grouped_25_over | 1.036e+04 | 6732.595 | 1.539 | 0.124 | -2861.121 | 2.36e+04 |
Omnibus: | 267.644 | Durbin-Watson: | 1.900 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 1473.437 |
Skew: | 1.749 | Prob(JB): | 0.00 |
Kurtosis: | 9.449 | Cond. No. | 2.59e+15 |
merit_raises_combined_salaried_regression = merit_raises_combined[(merit_raises_combined['dept'] == 'News') & (merit_raises_combined['pay_rate_type'] == 'Salaried')]
merit_raises_combined_salaried_regression = pd.get_dummies(merit_raises_combined_salaried_regression, columns=['gender','race_grouping','age_group_5'])
merit_raises_combined_salaried_regression = merit_raises_combined_salaried_regression.rename(columns={'race_grouping_person of color':'race_grouping_person_of_color','age_group_5_<25':'age_group_5_25_under','age_group_5_25-29':'age_group_5_25to29','age_group_5_30-34':'age_group_5_30to34','age_group_5_35-39':'age_group_5_35to39','age_group_5_40-44':'age_group_5_40to44','age_group_5_45-49':'age_group_5_45to49','age_group_5_50-54':'age_group_5_50to54','age_group_5_55-59':'age_group_5_55to59','age_group_5_60-64':'age_group_5_60to64','age_group_5_65+':'age_group_5_65_over'})
model9 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ gender_Female + gender_Male')
result9 = model9.fit()
result9.summary()
Dep. Variable: | base_pay_change | R-squared: | 0.003 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.001 |
Method: | Least Squares | F-statistic: | 1.232 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.292 |
Time: | 20:31:53 | Log-Likelihood: | -6627.8 |
No. Observations: | 761 | AIC: | 1.326e+04 |
Df Residuals: | 758 | BIC: | 1.328e+04 |
Df Model: | 2 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | -2.122e+15 | 9.84e+15 | -0.216 | 0.829 | -2.14e+16 | 1.72e+16 |
gender_Female | 2.122e+15 | 9.84e+15 | 0.216 | 0.829 | -1.72e+16 | 2.14e+16 |
gender_Male | 2.122e+15 | 9.84e+15 | 0.216 | 0.829 | -1.72e+16 | 2.14e+16 |
Omnibus: | 347.322 | Durbin-Watson: | 1.876 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 2241.002 |
Skew: | 1.962 | Prob(JB): | 0.00 |
Kurtosis: | 10.435 | Cond. No. | 3.93e+14 |
model10 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ race_grouping_white + race_grouping_person_of_color')
result10 = model10.fit()
result10.summary()
Dep. Variable: | base_pay_change | R-squared: | 0.007 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.004 |
Method: | Least Squares | F-statistic: | 2.619 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.0735 |
Time: | 20:31:53 | Log-Likelihood: | -6626.4 |
No. Observations: | 761 | AIC: | 1.326e+04 |
Df Residuals: | 758 | BIC: | 1.327e+04 |
Df Model: | 2 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 3487.7143 | 391.906 | 8.899 | 0.000 | 2718.364 | 4257.064 |
race_grouping_white | -353.2394 | 396.584 | -0.891 | 0.373 | -1131.772 | 425.293 |
race_grouping_person_of_color | -617.5097 | 408.291 | -1.512 | 0.131 | -1419.025 | 184.006 |
Omnibus: | 337.805 | Durbin-Watson: | 1.860 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 2092.674 |
Skew: | 1.912 | Prob(JB): | 0.00 |
Kurtosis: | 10.168 | Cond. No. | 16.6 |
model11 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result11 = model11.fit()
result11.summary()
Dep. Variable: | base_pay_change | R-squared: | 0.010 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.004 |
Method: | Least Squares | F-statistic: | 1.831 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.121 |
Time: | 20:31:53 | Log-Likelihood: | -6625.4 |
No. Observations: | 761 | AIC: | 1.326e+04 |
Df Residuals: | 756 | BIC: | 1.328e+04 |
Df Model: | 4 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | -2.279e+15 | 9.78e+15 | -0.233 | 0.816 | -2.15e+16 | 1.69e+16 |
gender_Female | 2.279e+15 | 9.78e+15 | 0.233 | 0.816 | -1.69e+16 | 2.15e+16 |
gender_Male | 2.279e+15 | 9.78e+15 | 0.233 | 0.816 | -1.69e+16 | 2.15e+16 |
race_grouping_white | -372.8852 | 397.016 | -0.939 | 0.348 | -1152.271 | 406.501 |
race_grouping_person_of_color | -620.5414 | 408.416 | -1.519 | 0.129 | -1422.306 | 181.224 |
Omnibus: | 342.727 | Durbin-Watson: | 1.857 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 2178.294 |
Skew: | 1.936 | Prob(JB): | 0.00 |
Kurtosis: | 10.329 | Cond. No. | 4.70e+14 |
new_reason_for_change_combined_regression = pd.DataFrame({'gender_Female': [1,0,1,0], 'gender_Male': [0,1,0,1], 'race_grouping_white': [1,1,0,0], 'race_grouping_person_of_color': [0,0,1,1]})
new_reason_for_change_combined_regression['predicted'] = result11.predict(new_reason_for_change_combined_regression)
new_reason_for_change_combined_regression
gender_Female | gender_Male | race_grouping_white | race_grouping_person_of_color | predicted | |
---|---|---|---|---|---|
0 | 1 | 0 | 1 | 0 | 3058.11 |
1 | 0 | 1 | 1 | 0 | 3209.11 |
2 | 1 | 0 | 0 | 1 | 2810.46 |
3 | 0 | 1 | 0 | 1 | 2961.46 |
model12 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result12 = model12.fit()
result12.summary()
Dep. Variable: | base_pay_change | R-squared: | 0.047 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.035 |
Method: | Least Squares | F-statistic: | 4.078 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 3.96e-05 |
Time: | 20:31:53 | Log-Likelihood: | -6610.9 |
No. Observations: | 761 | AIC: | 1.324e+04 |
Df Residuals: | 751 | BIC: | 1.329e+04 |
Df Model: | 9 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | -6.917e+14 | 1e+16 | -0.069 | 0.945 | -2.03e+16 | 1.89e+16 |
gender_Female | 6.676e+14 | 9.65e+15 | 0.069 | 0.945 | -1.83e+16 | 1.96e+16 |
gender_Male | 6.676e+14 | 9.65e+15 | 0.069 | 0.945 | -1.83e+16 | 1.96e+16 |
age_group_5_25_under | -1.208e+13 | 1.75e+14 | -0.069 | 0.945 | -3.55e+14 | 3.31e+14 |
age_group_5_25to29 | 2.418e+13 | 3.5e+14 | 0.069 | 0.945 | -6.62e+14 | 7.11e+14 |
age_group_5_30to34 | 2.418e+13 | 3.5e+14 | 0.069 | 0.945 | -6.62e+14 | 7.11e+14 |
age_group_5_35to39 | 2.418e+13 | 3.5e+14 | 0.069 | 0.945 | -6.62e+14 | 7.11e+14 |
age_group_5_40to44 | 2.418e+13 | 3.5e+14 | 0.069 | 0.945 | -6.62e+14 | 7.11e+14 |
age_group_5_45to49 | 2.418e+13 | 3.5e+14 | 0.069 | 0.945 | -6.62e+14 | 7.11e+14 |
age_group_5_50to54 | 2.418e+13 | 3.5e+14 | 0.069 | 0.945 | -6.62e+14 | 7.11e+14 |
age_group_5_55to59 | 2.418e+13 | 3.5e+14 | 0.069 | 0.945 | -6.62e+14 | 7.11e+14 |
age_group_5_60to64 | 2.418e+13 | 3.5e+14 | 0.069 | 0.945 | -6.62e+14 | 7.11e+14 |
age_group_5_65_over | 2.418e+13 | 3.5e+14 | 0.069 | 0.945 | -6.62e+14 | 7.11e+14 |
Omnibus: | 352.442 | Durbin-Watson: | 1.894 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 2391.173 |
Skew: | 1.977 | Prob(JB): | 0.00 |
Kurtosis: | 10.731 | Cond. No. | 4.14e+16 |
model13 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result13 = model13.fit()
result13.summary()
Dep. Variable: | base_pay_change | R-squared: | 0.050 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.038 |
Method: | Least Squares | F-statistic: | 3.968 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 2.67e-05 |
Time: | 20:31:53 | Log-Likelihood: | -6609.4 |
No. Observations: | 761 | AIC: | 1.324e+04 |
Df Residuals: | 750 | BIC: | 1.329e+04 |
Df Model: | 10 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 5.394e+15 | 1.06e+16 | 0.510 | 0.610 | -1.54e+16 | 2.61e+16 |
race_grouping_white | -271.7360 | 404.582 | -0.672 | 0.502 | -1065.984 | 522.512 |
race_grouping_person_of_color | -594.1019 | 412.284 | -1.441 | 0.150 | -1403.470 | 215.266 |
age_group_5_25_under | -2.268e+13 | 4.44e+13 | -0.510 | 0.610 | -1.1e+14 | 6.45e+13 |
age_group_5_25to29 | -5.394e+15 | 1.06e+16 | -0.510 | 0.610 | -2.61e+16 | 1.54e+16 |
age_group_5_30to34 | -5.394e+15 | 1.06e+16 | -0.510 | 0.610 | -2.61e+16 | 1.54e+16 |
age_group_5_35to39 | -5.394e+15 | 1.06e+16 | -0.510 | 0.610 | -2.61e+16 | 1.54e+16 |
age_group_5_40to44 | -5.394e+15 | 1.06e+16 | -0.510 | 0.610 | -2.61e+16 | 1.54e+16 |
age_group_5_45to49 | -5.394e+15 | 1.06e+16 | -0.510 | 0.610 | -2.61e+16 | 1.54e+16 |
age_group_5_50to54 | -5.394e+15 | 1.06e+16 | -0.510 | 0.610 | -2.61e+16 | 1.54e+16 |
age_group_5_55to59 | -5.394e+15 | 1.06e+16 | -0.510 | 0.610 | -2.61e+16 | 1.54e+16 |
age_group_5_60to64 | -5.394e+15 | 1.06e+16 | -0.510 | 0.610 | -2.61e+16 | 1.54e+16 |
age_group_5_65_over | -5.394e+15 | 1.06e+16 | -0.510 | 0.610 | -2.61e+16 | 1.54e+16 |
Omnibus: | 341.711 | Durbin-Watson: | 1.876 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 2213.427 |
Skew: | 1.921 | Prob(JB): | 0.00 |
Kurtosis: | 10.419 | Cond. No. | 4.17e+16 |
model14 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result14 = model14.fit()
result14.summary()
Dep. Variable: | base_pay_change | R-squared: | 0.054 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.040 |
Method: | Least Squares | F-statistic: | 3.905 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 1.61e-05 |
Time: | 20:31:53 | Log-Likelihood: | -6607.8 |
No. Observations: | 761 | AIC: | 1.324e+04 |
Df Residuals: | 749 | BIC: | 1.330e+04 |
Df Model: | 11 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | -9.183e+14 | 1e+16 | -0.092 | 0.927 | -2.06e+16 | 1.87e+16 |
gender_Female | 8.943e+14 | 9.75e+15 | 0.092 | 0.927 | -1.82e+16 | 2e+16 |
gender_Male | 8.943e+14 | 9.75e+15 | 0.092 | 0.927 | -1.82e+16 | 2e+16 |
race_grouping_white | -252.9941 | 394.499 | -0.641 | 0.522 | -1027.450 | 521.461 |
race_grouping_person_of_color | -560.3133 | 404.519 | -1.385 | 0.166 | -1354.439 | 233.812 |
age_group_5_25_under | 1.721e+13 | 1.88e+14 | 0.092 | 0.927 | -3.51e+14 | 3.86e+14 |
age_group_5_25to29 | 2.405e+13 | 2.62e+14 | 0.092 | 0.927 | -4.91e+14 | 5.39e+14 |
age_group_5_30to34 | 2.405e+13 | 2.62e+14 | 0.092 | 0.927 | -4.91e+14 | 5.39e+14 |
age_group_5_35to39 | 2.405e+13 | 2.62e+14 | 0.092 | 0.927 | -4.91e+14 | 5.39e+14 |
age_group_5_40to44 | 2.405e+13 | 2.62e+14 | 0.092 | 0.927 | -4.91e+14 | 5.39e+14 |
age_group_5_45to49 | 2.405e+13 | 2.62e+14 | 0.092 | 0.927 | -4.91e+14 | 5.39e+14 |
age_group_5_50to54 | 2.405e+13 | 2.62e+14 | 0.092 | 0.927 | -4.91e+14 | 5.39e+14 |
age_group_5_55to59 | 2.405e+13 | 2.62e+14 | 0.092 | 0.927 | -4.91e+14 | 5.39e+14 |
age_group_5_60to64 | 2.405e+13 | 2.62e+14 | 0.092 | 0.927 | -4.91e+14 | 5.39e+14 |
age_group_5_65_over | 2.405e+13 | 2.62e+14 | 0.092 | 0.927 | -4.91e+14 | 5.39e+14 |
Omnibus: | 348.013 | Durbin-Watson: | 1.876 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 2330.961 |
Skew: | 1.951 | Prob(JB): | 0.00 |
Kurtosis: | 10.634 | Cond. No. | 3.59e+17 |
model15 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ gender_Female + gender_Male')
result15 = model15.fit()
result15.summary()
Dep. Variable: | performance_rating | R-squared: | 0.006 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.003 |
Method: | Least Squares | F-statistic: | 2.077 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.126 |
Time: | 20:31:53 | Log-Likelihood: | -225.70 |
No. Observations: | 721 | AIC: | 457.4 |
Df Residuals: | 718 | BIC: | 471.1 |
Df Model: | 2 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | -7.958e+11 | 1.71e+12 | -0.466 | 0.641 | -4.15e+12 | 2.56e+12 |
gender_Female | 7.958e+11 | 1.71e+12 | 0.466 | 0.641 | -2.56e+12 | 4.15e+12 |
gender_Male | 7.958e+11 | 1.71e+12 | 0.466 | 0.641 | -2.56e+12 | 4.15e+12 |
Omnibus: | 26.473 | Durbin-Watson: | 1.816 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 28.572 |
Skew: | 0.486 | Prob(JB): | 6.25e-07 |
Kurtosis: | 3.088 | Cond. No. | 2.94e+14 |
model16 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ race_grouping_white + race_grouping_person_of_color')
result16 = model16.fit()
result16.summary()
Dep. Variable: | performance_rating | R-squared: | 0.042 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.040 |
Method: | Least Squares | F-statistic: | 15.84 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 1.86e-07 |
Time: | 20:31:53 | Log-Likelihood: | -212.21 |
No. Observations: | 721 | AIC: | 430.4 |
Df Residuals: | 718 | BIC: | 444.2 |
Df Model: | 2 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 3.7786 | 0.087 | 43.441 | 0.000 | 3.608 | 3.949 |
race_grouping_white | -0.1898 | 0.088 | -2.155 | 0.031 | -0.363 | -0.017 |
race_grouping_person_of_color | -0.3378 | 0.091 | -3.721 | 0.000 | -0.516 | -0.160 |
Omnibus: | 17.586 | Durbin-Watson: | 1.801 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 18.233 |
Skew: | 0.384 | Prob(JB): | 0.000110 |
Kurtosis: | 3.130 | Cond. No. | 16.2 |
model17 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result17 = model17.fit()
result17.summary()
Dep. Variable: | performance_rating | R-squared: | 0.046 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.041 |
Method: | Least Squares | F-statistic: | 8.598 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 8.83e-07 |
Time: | 20:31:54 | Log-Likelihood: | -210.86 |
No. Observations: | 721 | AIC: | 431.7 |
Df Residuals: | 716 | BIC: | 454.6 |
Df Model: | 4 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | -8.661e+11 | 1.67e+12 | -0.518 | 0.605 | -4.15e+12 | 2.42e+12 |
gender_Female | 8.661e+11 | 1.67e+12 | 0.518 | 0.605 | -2.42e+12 | 4.15e+12 |
gender_Male | 8.661e+11 | 1.67e+12 | 0.518 | 0.605 | -2.42e+12 | 4.15e+12 |
race_grouping_white | -0.1978 | 0.088 | -2.245 | 0.025 | -0.371 | -0.025 |
race_grouping_person_of_color | -0.3403 | 0.091 | -3.750 | 0.000 | -0.518 | -0.162 |
Omnibus: | 18.649 | Durbin-Watson: | 1.790 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 19.450 |
Skew: | 0.398 | Prob(JB): | 5.98e-05 |
Kurtosis: | 3.111 | Cond. No. | 3.52e+14 |
model18 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result18 = model18.fit()
result18.summary()
Dep. Variable: | performance_rating | R-squared: | 0.043 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.029 |
Method: | Least Squares | F-statistic: | 3.172 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.000542 |
Time: | 20:31:54 | Log-Likelihood: | -212.02 |
No. Observations: | 721 | AIC: | 446.0 |
Df Residuals: | 710 | BIC: | 496.4 |
Df Model: | 10 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | -9.526e+11 | 1.43e+12 | -0.664 | 0.507 | -3.77e+12 | 1.86e+12 |
gender_Female | 1.109e+12 | 1.67e+12 | 0.664 | 0.507 | -2.17e+12 | 4.39e+12 |
gender_Male | 1.109e+12 | 1.67e+12 | 0.664 | 0.507 | -2.17e+12 | 4.39e+12 |
age_group_5_25_under | -6.634e+09 | 9.99e+09 | -0.664 | 0.507 | -2.62e+10 | 1.3e+10 |
age_group_5_25to29 | -1.565e+11 | 2.36e+11 | -0.664 | 0.507 | -6.19e+11 | 3.06e+11 |
age_group_5_30to34 | -1.565e+11 | 2.36e+11 | -0.664 | 0.507 | -6.19e+11 | 3.06e+11 |
age_group_5_35to39 | -1.565e+11 | 2.36e+11 | -0.664 | 0.507 | -6.19e+11 | 3.06e+11 |
age_group_5_40to44 | -1.565e+11 | 2.36e+11 | -0.664 | 0.507 | -6.19e+11 | 3.06e+11 |
age_group_5_45to49 | -1.565e+11 | 2.36e+11 | -0.664 | 0.507 | -6.19e+11 | 3.06e+11 |
age_group_5_50to54 | -1.565e+11 | 2.36e+11 | -0.664 | 0.507 | -6.19e+11 | 3.06e+11 |
age_group_5_55to59 | -1.565e+11 | 2.36e+11 | -0.664 | 0.507 | -6.19e+11 | 3.06e+11 |
age_group_5_60to64 | -1.565e+11 | 2.36e+11 | -0.664 | 0.507 | -6.19e+11 | 3.06e+11 |
age_group_5_65_over | -1.565e+11 | 2.36e+11 | -0.664 | 0.507 | -6.19e+11 | 3.06e+11 |
Omnibus: | 19.660 | Durbin-Watson: | 1.843 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 20.700 |
Skew: | 0.414 | Prob(JB): | 3.20e-05 |
Kurtosis: | 3.050 | Cond. No. | 2.89e+16 |
model19 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result19 = model19.fit()
result19.summary()
Dep. Variable: | performance_rating | R-squared: | 0.063 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.050 |
Method: | Least Squares | F-statistic: | 4.805 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 1.05e-06 |
Time: | 20:31:54 | Log-Likelihood: | -204.17 |
No. Observations: | 721 | AIC: | 430.3 |
Df Residuals: | 710 | BIC: | 480.7 |
Df Model: | 10 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 6.792e+11 | 2.69e+12 | 0.253 | 0.801 | -4.6e+12 | 5.96e+12 |
race_grouping_white | -0.2004 | 0.092 | -2.168 | 0.030 | -0.382 | -0.019 |
race_grouping_person_of_color | -0.3231 | 0.092 | -3.523 | 0.000 | -0.503 | -0.143 |
age_group_5_25_under | -4.609e+08 | 1.82e+09 | -0.253 | 0.801 | -4.04e+09 | 3.12e+09 |
age_group_5_25to29 | -6.792e+11 | 2.69e+12 | -0.253 | 0.801 | -5.96e+12 | 4.6e+12 |
age_group_5_30to34 | -6.792e+11 | 2.69e+12 | -0.253 | 0.801 | -5.96e+12 | 4.6e+12 |
age_group_5_35to39 | -6.792e+11 | 2.69e+12 | -0.253 | 0.801 | -5.96e+12 | 4.6e+12 |
age_group_5_40to44 | -6.792e+11 | 2.69e+12 | -0.253 | 0.801 | -5.96e+12 | 4.6e+12 |
age_group_5_45to49 | -6.792e+11 | 2.69e+12 | -0.253 | 0.801 | -5.96e+12 | 4.6e+12 |
age_group_5_50to54 | -6.792e+11 | 2.69e+12 | -0.253 | 0.801 | -5.96e+12 | 4.6e+12 |
age_group_5_55to59 | -6.792e+11 | 2.69e+12 | -0.253 | 0.801 | -5.96e+12 | 4.6e+12 |
age_group_5_60to64 | -6.792e+11 | 2.69e+12 | -0.253 | 0.801 | -5.96e+12 | 4.6e+12 |
age_group_5_65_over | -6.792e+11 | 2.69e+12 | -0.253 | 0.801 | -5.96e+12 | 4.6e+12 |
Omnibus: | 14.925 | Durbin-Watson: | 1.832 |
---|---|---|---|
Prob(Omnibus): | 0.001 | Jarque-Bera (JB): | 15.278 |
Skew: | 0.349 | Prob(JB): | 0.000481 |
Kurtosis: | 3.146 | Cond. No. | 2.53e+16 |
model20 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result20 = model20.fit()
result20.summary()
Dep. Variable: | performance_rating | R-squared: | 0.073 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.058 |
Method: | Least Squares | F-statistic: | 5.060 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 1.22e-07 |
Time: | 20:31:54 | Log-Likelihood: | -200.53 |
No. Observations: | 721 | AIC: | 425.1 |
Df Residuals: | 709 | BIC: | 480.0 |
Df Model: | 11 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | -1.035e+12 | 1.37e+12 | -0.756 | 0.450 | -3.72e+12 | 1.65e+12 |
gender_Female | 1.223e+12 | 1.62e+12 | 0.756 | 0.450 | -1.95e+12 | 4.4e+12 |
gender_Male | 1.223e+12 | 1.62e+12 | 0.756 | 0.450 | -1.95e+12 | 4.4e+12 |
race_grouping_white | -0.2016 | 0.088 | -2.281 | 0.023 | -0.375 | -0.028 |
race_grouping_person_of_color | -0.3224 | 0.091 | -3.555 | 0.000 | -0.500 | -0.144 |
age_group_5_25_under | -6.881e+09 | 9.11e+09 | -0.756 | 0.450 | -2.48e+10 | 1.1e+10 |
age_group_5_25to29 | -1.878e+11 | 2.49e+11 | -0.756 | 0.450 | -6.76e+11 | 3e+11 |
age_group_5_30to34 | -1.878e+11 | 2.49e+11 | -0.756 | 0.450 | -6.76e+11 | 3e+11 |
age_group_5_35to39 | -1.878e+11 | 2.49e+11 | -0.756 | 0.450 | -6.76e+11 | 3e+11 |
age_group_5_40to44 | -1.878e+11 | 2.49e+11 | -0.756 | 0.450 | -6.76e+11 | 3e+11 |
age_group_5_45to49 | -1.878e+11 | 2.49e+11 | -0.756 | 0.450 | -6.76e+11 | 3e+11 |
age_group_5_50to54 | -1.878e+11 | 2.49e+11 | -0.756 | 0.450 | -6.76e+11 | 3e+11 |
age_group_5_55to59 | -1.878e+11 | 2.49e+11 | -0.756 | 0.450 | -6.76e+11 | 3e+11 |
age_group_5_60to64 | -1.878e+11 | 2.49e+11 | -0.756 | 0.450 | -6.76e+11 | 3e+11 |
age_group_5_65_over | -1.878e+11 | 2.49e+11 | -0.756 | 0.450 | -6.76e+11 | 3e+11 |
Omnibus: | 14.641 | Durbin-Watson: | 1.826 |
---|---|---|---|
Prob(Omnibus): | 0.001 | Jarque-Bera (JB): | 15.004 |
Skew: | 0.349 | Prob(JB): | 0.000552 |
Kurtosis: | 3.109 | Cond. No. | 1.13e+17 |
news_hourly_regression = news_hourly[['department','gender','race_ethnicity','current_base_pay','job_profile_current','cost_center_current','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','age','years_of_service','age_group_5','years_of_service_grouped','dept','desk','tier','race_grouping']]
news_hourly_regression = pd.get_dummies(news_hourly_regression, columns=['gender','race_ethnicity','age_group_5','years_of_service_grouped','dept','desk','tier','race_grouping'])
news_hourly_regression = news_hourly_regression.rename(columns={'race_grouping_person of color':'race_grouping_person_of_color','age_group_5_<25':'age_group_5_25_under','age_group_5_25-29':'age_group_5_25to29','age_group_5_30-34':'age_group_5_30to34','age_group_5_35-39':'age_group_5_35to39','age_group_5_40-44':'age_group_5_40to44','age_group_5_45-49':'age_group_5_45to49','age_group_5_50-54':'age_group_5_50to54','age_group_5_55-59':'age_group_5_55to59','age_group_5_60-64':'age_group_5_60to64','age_group_5_65+':'age_group_5_65_over','tier_Tier 1':'tier_Tier_1','tier_Tier 2':'tier_Tier_2','tier_Tier 3':'tier_Tier_3','tier_Tier 4':'tier_Tier_4','years_of_service_grouped_0':'years_of_service_grouped_0','years_of_service_grouped_1-2':'years_of_service_grouped_1to2','years_of_service_grouped_3-5':'years_of_service_grouped_3to5','years_of_service_grouped_6-10':'years_of_service_grouped_6to10','years_of_service_grouped_11-15':'years_of_service_grouped_11to15','years_of_service_grouped_16-20':'years_of_service_grouped_16to20','years_of_service_grouped_21-25':'years_of_service_grouped_21to25','years_of_service_grouped_25+':'years_of_service_grouped_25_over'})
import statsmodels.formula.api as sm
model21 = sm.ols(data=news_hourly_regression, formula = 'current_base_pay ~ gender_Female + gender_Male')
result21 = model2.fit()
result21.summary()
Dep. Variable: | current_base_pay | R-squared: | 0.027 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.024 |
Method: | Least Squares | F-statistic: | 8.957 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.000145 |
Time: | 20:31:54 | Log-Likelihood: | -7945.3 |
No. Observations: | 657 | AIC: | 1.590e+04 |
Df Residuals: | 654 | BIC: | 1.591e+04 |
Df Model: | 2 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 1.256e+05 | 6939.188 | 18.102 | 0.000 | 1.12e+05 | 1.39e+05 |
race_grouping_white | 574.5152 | 7242.215 | 0.079 | 0.937 | -1.36e+04 | 1.48e+04 |
race_grouping_person_of_color | -1.549e+04 | 7650.339 | -2.024 | 0.043 | -3.05e+04 | -464.535 |
Omnibus: | 167.823 | Durbin-Watson: | 1.711 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 374.307 |
Skew: | 1.366 | Prob(JB): | 5.25e-82 |
Kurtosis: | 5.492 | Cond. No. | 9.02 |
model22 = sm.ols(data=news_hourly_regression, formula = 'current_base_pay ~ race_grouping_white + race_grouping_person_of_color')
result22 = model22.fit()
result22.summary()
Dep. Variable: | current_base_pay | R-squared: | 0.020 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | -0.005 |
Method: | Least Squares | F-statistic: | 0.8111 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.448 |
Time: | 20:31:54 | Log-Likelihood: | -330.05 |
No. Observations: | 83 | AIC: | 666.1 |
Df Residuals: | 80 | BIC: | 673.4 |
Df Model: | 2 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 40.8000 | 13.144 | 3.104 | 0.003 | 14.643 | 66.957 |
race_grouping_white | -4.7002 | 13.256 | -0.355 | 0.724 | -31.081 | 21.681 |
race_grouping_person_of_color | -8.5133 | 13.415 | -0.635 | 0.527 | -35.209 | 18.183 |
Omnibus: | 10.792 | Durbin-Watson: | 1.417 |
---|---|---|---|
Prob(Omnibus): | 0.005 | Jarque-Bera (JB): | 11.225 |
Skew: | 0.743 | Prob(JB): | 0.00365 |
Kurtosis: | 4.020 | Cond. No. | 20.0 |
model23 = sm.ols(data=news_hourly_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result23 = model23.fit()
result23.summary()
Dep. Variable: | current_base_pay | R-squared: | 0.064 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.029 |
Method: | Least Squares | F-statistic: | 1.813 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.152 |
Time: | 20:31:54 | Log-Likelihood: | -328.12 |
No. Observations: | 83 | AIC: | 664.2 |
Df Residuals: | 79 | BIC: | 673.9 |
Df Model: | 3 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 25.3109 | 8.670 | 2.919 | 0.005 | 8.054 | 42.568 |
gender_Female | 15.4891 | 4.416 | 3.507 | 0.001 | 6.699 | 24.279 |
gender_Male | 9.8218 | 4.728 | 2.078 | 0.041 | 0.412 | 19.232 |
race_grouping_white | -2.2573 | 13.094 | -0.172 | 0.864 | -28.321 | 23.806 |
race_grouping_person_of_color | -6.6242 | 13.225 | -0.501 | 0.618 | -32.948 | 19.699 |
Omnibus: | 7.875 | Durbin-Watson: | 1.345 |
---|---|---|---|
Prob(Omnibus): | 0.019 | Jarque-Bera (JB): | 7.480 |
Skew: | 0.612 | Prob(JB): | 0.0238 |
Kurtosis: | 3.816 | Cond. No. | 3.52e+15 |
new_news_hourly_regression = pd.DataFrame({'gender_Female': [1,0,1,0], 'gender_Male': [0,1,0,1], 'race_grouping_white': [1,1,0,0], 'race_grouping_person_of_color': [0,0,1,1], 'age': [40,40,40,40]})
new_news_hourly_regression['predicted'] = result23.predict(new_news_hourly_regression)
new_news_hourly_regression
gender_Female | gender_Male | race_grouping_white | race_grouping_person_of_color | age | predicted | |
---|---|---|---|---|---|---|
0 | 1 | 0 | 1 | 0 | 40 | 38.54 |
1 | 0 | 1 | 1 | 0 | 40 | 32.88 |
2 | 1 | 0 | 0 | 1 | 40 | 34.18 |
3 | 0 | 1 | 0 | 1 | 40 | 28.51 |
model24 = sm.ols(data=news_hourly_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result24 = model24.fit()
result24.summary()
Dep. Variable: | current_base_pay | R-squared: | 0.363 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.275 |
Method: | Least Squares | F-statistic: | 4.108 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.000170 |
Time: | 20:31:54 | Log-Likelihood: | -312.15 |
No. Observations: | 83 | AIC: | 646.3 |
Df Residuals: | 72 | BIC: | 672.9 |
Df Model: | 10 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 23.2877 | 0.882 | 26.410 | 0.000 | 21.530 | 25.046 |
gender_Female | 14.1092 | 1.294 | 10.905 | 0.000 | 11.530 | 16.688 |
gender_Male | 9.1785 | 1.454 | 6.314 | 0.000 | 6.281 | 12.076 |
age_group_5_25_under | -7.0396 | 3.027 | -2.325 | 0.023 | -13.075 | -1.005 |
age_group_5_25to29 | -8.5046 | 2.910 | -2.923 | 0.005 | -14.305 | -2.704 |
age_group_5_30to34 | -3.2016 | 2.998 | -1.068 | 0.289 | -9.177 | 2.774 |
age_group_5_35to39 | 3.6055 | 4.713 | 0.765 | 0.447 | -5.790 | 13.001 |
age_group_5_40to44 | -5.4847 | 4.685 | -1.171 | 0.246 | -14.824 | 3.855 |
age_group_5_45to49 | 11.0866 | 5.957 | 1.861 | 0.067 | -0.789 | 22.963 |
age_group_5_50to54 | 6.0747 | 4.032 | 1.507 | 0.136 | -1.962 | 14.111 |
age_group_5_55to59 | 5.0734 | 4.020 | 1.262 | 0.211 | -2.941 | 13.088 |
age_group_5_60to64 | 7.2149 | 4.044 | 1.784 | 0.079 | -0.846 | 15.276 |
age_group_5_65_over | 14.4632 | 4.353 | 3.322 | 0.001 | 5.785 | 23.141 |
Omnibus: | 2.772 | Durbin-Watson: | 1.665 |
---|---|---|---|
Prob(Omnibus): | 0.250 | Jarque-Bera (JB): | 2.066 |
Skew: | 0.283 | Prob(JB): | 0.356 |
Kurtosis: | 3.525 | Cond. No. | 8.06e+15 |
model25 = sm.ols(data=news_hourly_regression, formula = 'current_base_pay ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result25 = model25.fit()
result25.summary()
Dep. Variable: | current_base_pay | R-squared: | 0.338 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.235 |
Method: | Least Squares | F-statistic: | 3.294 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.00108 |
Time: | 20:31:54 | Log-Likelihood: | -313.77 |
No. Observations: | 83 | AIC: | 651.5 |
Df Residuals: | 71 | BIC: | 680.6 |
Df Model: | 11 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 43.1529 | 10.818 | 3.989 | 0.000 | 21.582 | 64.724 |
race_grouping_white | -9.7635 | 11.921 | -0.819 | 0.416 | -33.533 | 14.006 |
race_grouping_person_of_color | -9.7817 | 12.133 | -0.806 | 0.423 | -33.974 | 14.410 |
age_group_5_25_under | -4.0782 | 3.419 | -1.193 | 0.237 | -10.896 | 2.739 |
age_group_5_25to29 | -6.7920 | 3.202 | -2.121 | 0.037 | -13.178 | -0.406 |
age_group_5_30to34 | -2.3529 | 3.075 | -0.765 | 0.447 | -8.484 | 3.778 |
age_group_5_35to39 | 4.6546 | 5.011 | 0.929 | 0.356 | -5.338 | 14.647 |
age_group_5_40to44 | -3.4384 | 5.070 | -0.678 | 0.500 | -13.548 | 6.671 |
age_group_5_45to49 | 13.4506 | 6.282 | 2.141 | 0.036 | 0.926 | 25.976 |
age_group_5_50to54 | 8.6761 | 4.300 | 2.018 | 0.047 | 0.103 | 17.250 |
age_group_5_55to59 | 6.9730 | 4.289 | 1.626 | 0.108 | -1.579 | 15.525 |
age_group_5_60to64 | 8.4049 | 4.355 | 1.930 | 0.058 | -0.279 | 17.089 |
age_group_5_65_over | 17.6550 | 4.584 | 3.851 | 0.000 | 8.515 | 26.796 |
Omnibus: | 6.397 | Durbin-Watson: | 1.685 |
---|---|---|---|
Prob(Omnibus): | 0.041 | Jarque-Bera (JB): | 5.827 |
Skew: | 0.524 | Prob(JB): | 0.0543 |
Kurtosis: | 3.766 | Cond. No. | 6.94e+15 |
model26 = sm.ols(data=news_hourly_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result26 = model26.fit()
result26.summary()
Dep. Variable: | current_base_pay | R-squared: | 0.367 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.258 |
Method: | Least Squares | F-statistic: | 3.377 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.000642 |
Time: | 20:31:54 | Log-Likelihood: | -311.92 |
No. Observations: | 83 | AIC: | 649.8 |
Df Residuals: | 70 | BIC: | 681.3 |
Df Model: | 12 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 27.7804 | 7.402 | 3.753 | 0.000 | 13.017 | 42.544 |
gender_Female | 16.2585 | 3.749 | 4.336 | 0.000 | 8.781 | 23.736 |
gender_Male | 11.5220 | 4.107 | 2.805 | 0.007 | 3.330 | 19.714 |
race_grouping_white | -7.1864 | 11.831 | -0.607 | 0.546 | -30.782 | 16.410 |
race_grouping_person_of_color | -7.3203 | 12.030 | -0.608 | 0.545 | -31.314 | 16.674 |
age_group_5_25_under | -6.4603 | 3.329 | -1.941 | 0.056 | -13.099 | 0.179 |
age_group_5_25to29 | -8.0062 | 3.060 | -2.616 | 0.011 | -14.109 | -1.903 |
age_group_5_30to34 | -3.2389 | 3.033 | -1.068 | 0.289 | -9.288 | 2.810 |
age_group_5_35to39 | 4.0334 | 4.883 | 0.826 | 0.412 | -5.706 | 13.773 |
age_group_5_40to44 | -4.9376 | 4.925 | -1.003 | 0.320 | -14.760 | 4.885 |
age_group_5_45to49 | 11.5663 | 6.131 | 1.886 | 0.063 | -0.662 | 23.795 |
age_group_5_50to54 | 6.5828 | 4.179 | 1.575 | 0.120 | -1.753 | 14.918 |
age_group_5_55to59 | 5.5728 | 4.148 | 1.344 | 0.183 | -2.700 | 13.845 |
age_group_5_60to64 | 7.6483 | 4.227 | 1.809 | 0.075 | -0.782 | 16.079 |
age_group_5_65_over | 15.0199 | 4.499 | 3.338 | 0.001 | 6.046 | 23.994 |
Omnibus: | 3.153 | Durbin-Watson: | 1.648 |
---|---|---|---|
Prob(Omnibus): | 0.207 | Jarque-Bera (JB): | 2.433 |
Skew: | 0.314 | Prob(JB): | 0.296 |
Kurtosis: | 3.555 | Cond. No. | 8.54e+15 |
# model27 = sm.ols(data=news_hourly_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over + tier_Tier_1 + tier_Tier_2 + tier_Tier_3 + tier_Tier_4')
# result27 = model27.fit()
# result27.summary()
# model28 = sm.ols(data=news_hourly_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over + tier_Tier_1 + tier_Tier_2 + tier_Tier_3 + tier_Tier_4 + years_of_service_grouped_0 + years_of_service_grouped_1to2 + years_of_service_grouped_3to5 + years_of_service_grouped_6to10 + years_of_service_grouped_11to15 + years_of_service_grouped_16to20 + years_of_service_grouped_21to25 + years_of_service_grouped_25_over')
# result28 = model28.fit()
# result28.summary()
merit_raises_combined_hourly_regression = merit_raises_combined[(merit_raises_combined['dept'] == 'News') & (merit_raises_combined['pay_rate_type'] == 'Hourly')]
merit_raises_combined_hourly_regression = pd.get_dummies(merit_raises_combined_hourly_regression, columns=['gender','race_grouping','age_group_5'])
merit_raises_combined_hourly_regression = merit_raises_combined_hourly_regression.rename(columns={'race_grouping_person of color':'race_grouping_person_of_color','age_group_5_<25':'age_group_5_25_under','age_group_5_25-29':'age_group_5_25to29','age_group_5_30-34':'age_group_5_30to34','age_group_5_35-39':'age_group_5_35to39','age_group_5_40-44':'age_group_5_40to44','age_group_5_45-49':'age_group_5_45to49','age_group_5_50-54':'age_group_5_50to54','age_group_5_55-59':'age_group_5_55to59','age_group_5_60-64':'age_group_5_60to64','age_group_5_65+':'age_group_5_65_over'})
model29 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ gender_Female + gender_Male')
result29 = model29.fit()
result29.summary()
Dep. Variable: | base_pay_change | R-squared: | 0.007 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | -0.003 |
Method: | Least Squares | F-statistic: | 0.7160 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.399 |
Time: | 20:31:54 | Log-Likelihood: | -197.46 |
No. Observations: | 105 | AIC: | 398.9 |
Df Residuals: | 103 | BIC: | 404.2 |
Df Model: | 1 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 1.0379 | 0.108 | 9.624 | 0.000 | 0.824 | 1.252 |
gender_Female | 0.6559 | 0.157 | 4.182 | 0.000 | 0.345 | 0.967 |
gender_Male | 0.3821 | 0.183 | 2.085 | 0.040 | 0.019 | 0.745 |
Omnibus: | 123.881 | Durbin-Watson: | 1.909 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 2564.506 |
Skew: | 4.023 | Prob(JB): | 0.00 |
Kurtosis: | 25.835 | Cond. No. | 3.52e+15 |
model30 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ race_grouping_white + race_grouping_person_of_color')
result30 = model30.fit()
result30.summary()
Dep. Variable: | base_pay_change | R-squared: | 0.029 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.019 |
Method: | Least Squares | F-statistic: | 3.036 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.0844 |
Time: | 20:31:54 | Log-Likelihood: | -196.30 |
No. Observations: | 105 | AIC: | 396.6 |
Df Residuals: | 103 | BIC: | 401.9 |
Df Model: | 1 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 0.9666 | 0.117 | 8.294 | 0.000 | 0.735 | 1.198 |
race_grouping_white | 0.7879 | 0.156 | 5.039 | 0.000 | 0.478 | 1.098 |
race_grouping_person_of_color | 0.1787 | 0.208 | 0.857 | 0.393 | -0.235 | 0.592 |
Omnibus: | 123.450 | Durbin-Watson: | 1.778 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 2635.259 |
Skew: | 3.980 | Prob(JB): | 0.00 |
Kurtosis: | 26.216 | Cond. No. | 2.38e+15 |
model31 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result31 = model31.fit()
result31.summary()
Dep. Variable: | base_pay_change | R-squared: | 0.033 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.014 |
Method: | Least Squares | F-statistic: | 1.725 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.183 |
Time: | 20:31:54 | Log-Likelihood: | -196.08 |
No. Observations: | 105 | AIC: | 398.2 |
Df Residuals: | 102 | BIC: | 406.1 |
Df Model: | 2 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 0.7145 | 0.089 | 8.019 | 0.000 | 0.538 | 0.891 |
gender_Female | 0.4633 | 0.160 | 2.901 | 0.005 | 0.147 | 0.780 |
gender_Male | 0.2512 | 0.175 | 1.435 | 0.154 | -0.096 | 0.599 |
race_grouping_white | 0.6484 | 0.162 | 3.999 | 0.000 | 0.327 | 0.970 |
race_grouping_person_of_color | 0.0661 | 0.200 | 0.330 | 0.742 | -0.331 | 0.463 |
Omnibus: | 122.482 | Durbin-Watson: | 1.806 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 2547.850 |
Skew: | 3.943 | Prob(JB): | 0.00 |
Kurtosis: | 25.808 | Cond. No. | 4.38e+15 |
new_reason_for_change_combined_regression = pd.DataFrame({'gender_Female': [1,0,1,0], 'gender_Male': [0,1,0,1], 'race_grouping_white': [1,1,0,0], 'race_grouping_person_of_color': [0,0,1,1]})
new_reason_for_change_combined_regression['predicted'] = result31.predict(new_reason_for_change_combined_regression)
new_reason_for_change_combined_regression
gender_Female | gender_Male | race_grouping_white | race_grouping_person_of_color | predicted | |
---|---|---|---|---|---|
0 | 1 | 0 | 1 | 0 | 1.83 |
1 | 0 | 1 | 1 | 0 | 1.61 |
2 | 1 | 0 | 0 | 1 | 1.24 |
3 | 0 | 1 | 0 | 1 | 1.03 |
model32 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result32 = model32.fit()
result32.summary()
Dep. Variable: | base_pay_change | R-squared: | 0.059 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | -0.030 |
Method: | Least Squares | F-statistic: | 0.6599 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.743 |
Time: | 20:31:54 | Log-Likelihood: | -194.64 |
No. Observations: | 105 | AIC: | 409.3 |
Df Residuals: | 95 | BIC: | 435.8 |
Df Model: | 9 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 1.0026 | 0.109 | 9.210 | 0.000 | 0.786 | 1.219 |
gender_Female | 0.6319 | 0.176 | 3.600 | 0.001 | 0.283 | 0.980 |
gender_Male | 0.3707 | 0.194 | 1.909 | 0.059 | -0.015 | 0.756 |
age_group_5_25_under | 3.923e-16 | 2.08e-16 | 1.885 | 0.063 | -2.1e-17 | 8.05e-16 |
age_group_5_25to29 | 0.3173 | 0.421 | 0.753 | 0.453 | -0.519 | 1.153 |
age_group_5_30to34 | -0.2924 | 0.369 | -0.793 | 0.430 | -1.024 | 0.439 |
age_group_5_35to39 | -0.0689 | 0.448 | -0.154 | 0.878 | -0.959 | 0.821 |
age_group_5_40to44 | 0.0943 | 0.613 | 0.154 | 0.878 | -1.122 | 1.310 |
age_group_5_45to49 | 0.3428 | 0.448 | 0.765 | 0.446 | -0.547 | 1.233 |
age_group_5_50to54 | 0.2283 | 0.451 | 0.506 | 0.614 | -0.667 | 1.124 |
age_group_5_55to59 | -0.4424 | 0.404 | -1.095 | 0.276 | -1.244 | 0.359 |
age_group_5_60to64 | 0.0188 | 0.698 | 0.027 | 0.979 | -1.366 | 1.404 |
age_group_5_65_over | 0.8047 | 0.513 | 1.569 | 0.120 | -0.213 | 1.823 |
Omnibus: | 121.722 | Durbin-Watson: | 1.800 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 2568.325 |
Skew: | 3.893 | Prob(JB): | 0.00 |
Kurtosis: | 25.944 | Cond. No. | 1.85e+17 |
model33 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result33 = model33.fit()
result33.summary()
Dep. Variable: | base_pay_change | R-squared: | 0.087 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.001 |
Method: | Least Squares | F-statistic: | 1.012 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.436 |
Time: | 20:31:55 | Log-Likelihood: | -193.02 |
No. Observations: | 105 | AIC: | 406.0 |
Df Residuals: | 95 | BIC: | 432.6 |
Df Model: | 9 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 0.9062 | 0.121 | 7.505 | 0.000 | 0.667 | 1.146 |
race_grouping_white | 0.8027 | 0.165 | 4.866 | 0.000 | 0.475 | 1.130 |
race_grouping_person_of_color | 0.1036 | 0.221 | 0.468 | 0.641 | -0.336 | 0.543 |
age_group_5_25_under | -4.956e-17 | 2.88e-16 | -0.172 | 0.864 | -6.22e-16 | 5.23e-16 |
age_group_5_25to29 | 0.4679 | 0.417 | 1.123 | 0.264 | -0.359 | 1.295 |
age_group_5_30to34 | -0.2560 | 0.364 | -0.704 | 0.483 | -0.978 | 0.466 |
age_group_5_35to39 | -0.2739 | 0.451 | -0.607 | 0.545 | -1.169 | 0.622 |
age_group_5_40to44 | 0.1658 | 0.603 | 0.275 | 0.784 | -1.031 | 1.363 |
age_group_5_45to49 | 0.2543 | 0.442 | 0.575 | 0.566 | -0.623 | 1.132 |
age_group_5_50to54 | 0.2634 | 0.440 | 0.598 | 0.551 | -0.611 | 1.138 |
age_group_5_55to59 | -0.3036 | 0.392 | -0.775 | 0.440 | -1.081 | 0.474 |
age_group_5_60to64 | -0.3169 | 0.663 | -0.478 | 0.634 | -1.634 | 1.000 |
age_group_5_65_over | 0.9053 | 0.501 | 1.807 | 0.074 | -0.089 | 1.900 |
Omnibus: | 118.288 | Durbin-Watson: | 1.675 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 2407.066 |
Skew: | 3.731 | Prob(JB): | 0.00 |
Kurtosis: | 25.238 | Cond. No. | 1.07e+17 |
model34 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result34 = model34.fit()
result34.summary()
Dep. Variable: | base_pay_change | R-squared: | 0.088 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | -0.008 |
Method: | Least Squares | F-statistic: | 0.9125 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.525 |
Time: | 20:31:55 | Log-Likelihood: | -192.96 |
No. Observations: | 105 | AIC: | 407.9 |
Df Residuals: | 94 | BIC: | 437.1 |
Df Model: | 10 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 0.6901 | 0.093 | 7.445 | 0.000 | 0.506 | 0.874 |
gender_Female | 0.4033 | 0.184 | 2.197 | 0.030 | 0.039 | 0.768 |
gender_Male | 0.2869 | 0.188 | 1.527 | 0.130 | -0.086 | 0.660 |
race_grouping_white | 0.6803 | 0.175 | 3.883 | 0.000 | 0.332 | 1.028 |
race_grouping_person_of_color | 0.0098 | 0.217 | 0.045 | 0.964 | -0.421 | 0.441 |
age_group_5_25_under | -5.684e-17 | 1.2e-16 | -0.473 | 0.638 | -2.96e-16 | 1.82e-16 |
age_group_5_25to29 | 0.4240 | 0.424 | 1.001 | 0.320 | -0.417 | 1.265 |
age_group_5_30to34 | -0.2809 | 0.366 | -0.768 | 0.445 | -1.007 | 0.446 |
age_group_5_35to39 | -0.2806 | 0.456 | -0.615 | 0.540 | -1.186 | 0.625 |
age_group_5_40to44 | 0.1302 | 0.607 | 0.215 | 0.830 | -1.074 | 1.334 |
age_group_5_45to49 | 0.2429 | 0.446 | 0.545 | 0.587 | -0.642 | 1.128 |
age_group_5_50to54 | 0.2205 | 0.447 | 0.494 | 0.623 | -0.666 | 1.107 |
age_group_5_55to59 | -0.3574 | 0.405 | -0.883 | 0.380 | -1.161 | 0.447 |
age_group_5_60to64 | -0.2654 | 0.705 | -0.376 | 0.708 | -1.665 | 1.135 |
age_group_5_65_over | 0.8567 | 0.509 | 1.683 | 0.096 | -0.154 | 1.868 |
Omnibus: | 118.098 | Durbin-Watson: | 1.695 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 2383.378 |
Skew: | 3.725 | Prob(JB): | 0.00 |
Kurtosis: | 25.119 | Cond. No. | 7.28e+16 |
model35 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ gender_Female + gender_Male')
result35 = model35.fit()
result35.summary()
Dep. Variable: | performance_rating | R-squared: | 0.010 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | -0.001 |
Method: | Least Squares | F-statistic: | 0.9173 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.341 |
Time: | 20:31:55 | Log-Likelihood: | -39.114 |
No. Observations: | 97 | AIC: | 82.23 |
Df Residuals: | 95 | BIC: | 87.38 |
Df Model: | 1 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 2.3409 | 0.026 | 90.769 | 0.000 | 2.290 | 2.392 |
gender_Female | 1.2075 | 0.037 | 32.445 | 0.000 | 1.134 | 1.281 |
gender_Male | 1.1334 | 0.044 | 25.729 | 0.000 | 1.046 | 1.221 |
Omnibus: | 7.850 | Durbin-Watson: | 1.804 |
---|---|---|---|
Prob(Omnibus): | 0.020 | Jarque-Bera (JB): | 7.255 |
Skew: | 0.600 | Prob(JB): | 0.0266 |
Kurtosis: | 2.404 | Cond. No. | 5.17e+15 |
model36 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ race_grouping_white + race_grouping_person_of_color')
result36 = model36.fit()
result36.summary()
Dep. Variable: | performance_rating | R-squared: | 0.057 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.047 |
Method: | Least Squares | F-statistic: | 5.767 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.0183 |
Time: | 20:31:55 | Log-Likelihood: | -36.722 |
No. Observations: | 97 | AIC: | 77.44 |
Df Residuals: | 95 | BIC: | 82.59 |
Df Model: | 1 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 2.3191 | 0.027 | 86.016 | 0.000 | 2.266 | 2.373 |
race_grouping_white | 1.2566 | 0.037 | 34.408 | 0.000 | 1.184 | 1.329 |
race_grouping_person_of_color | 1.0624 | 0.048 | 22.150 | 0.000 | 0.967 | 1.158 |
Omnibus: | 4.825 | Durbin-Watson: | 1.889 |
---|---|---|---|
Prob(Omnibus): | 0.090 | Jarque-Bera (JB): | 4.105 |
Skew: | 0.409 | Prob(JB): | 0.128 |
Kurtosis: | 2.412 | Cond. No. | 2.01e+16 |
model37 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result37 = model37.fit()
result37.summary()
Dep. Variable: | performance_rating | R-squared: | 0.061 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.041 |
Method: | Least Squares | F-statistic: | 3.052 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.0520 |
Time: | 20:31:55 | Log-Likelihood: | -36.529 |
No. Observations: | 97 | AIC: | 79.06 |
Df Residuals: | 94 | BIC: | 86.78 |
Df Model: | 2 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 1.7369 | 0.021 | 84.064 | 0.000 | 1.696 | 1.778 |
gender_Female | 0.8919 | 0.038 | 23.617 | 0.000 | 0.817 | 0.967 |
gender_Male | 0.8450 | 0.042 | 20.336 | 0.000 | 0.762 | 0.927 |
race_grouping_white | 0.9616 | 0.038 | 25.216 | 0.000 | 0.886 | 1.037 |
race_grouping_person_of_color | 0.7753 | 0.046 | 16.782 | 0.000 | 0.684 | 0.867 |
Omnibus: | 5.208 | Durbin-Watson: | 1.901 |
---|---|---|---|
Prob(Omnibus): | 0.074 | Jarque-Bera (JB): | 4.606 |
Skew: | 0.451 | Prob(JB): | 0.100 |
Kurtosis: | 2.431 | Cond. No. | 2.28e+16 |
model38 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result38 = model38.fit()
result38.summary()
Dep. Variable: | performance_rating | R-squared: | 0.145 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.056 |
Method: | Least Squares | F-statistic: | 1.637 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.117 |
Time: | 20:31:55 | Log-Likelihood: | -31.993 |
No. Observations: | 97 | AIC: | 83.99 |
Df Residuals: | 87 | BIC: | 109.7 |
Df Model: | 9 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 2.1907 | 0.025 | 89.100 | 0.000 | 2.142 | 2.240 |
gender_Female | 1.1670 | 0.041 | 28.529 | 0.000 | 1.086 | 1.248 |
gender_Male | 1.0238 | 0.046 | 22.311 | 0.000 | 0.933 | 1.115 |
age_group_5_25_under | -8.104e-17 | 3.6e-17 | -2.253 | 0.027 | -1.53e-16 | -9.56e-18 |
age_group_5_25to29 | 0.0677 | 0.096 | 0.705 | 0.483 | -0.123 | 0.258 |
age_group_5_30to34 | 0.1601 | 0.085 | 1.888 | 0.062 | -0.008 | 0.329 |
age_group_5_35to39 | 0.1113 | 0.103 | 1.084 | 0.281 | -0.093 | 0.315 |
age_group_5_40to44 | 0.2567 | 0.134 | 1.912 | 0.059 | -0.010 | 0.524 |
age_group_5_45to49 | 0.4222 | 0.099 | 4.286 | 0.000 | 0.226 | 0.618 |
age_group_5_50to54 | 0.1531 | 0.099 | 1.545 | 0.126 | -0.044 | 0.350 |
age_group_5_55to59 | 0.2610 | 0.099 | 2.642 | 0.010 | 0.065 | 0.457 |
age_group_5_60to64 | 0.6055 | 0.154 | 3.927 | 0.000 | 0.299 | 0.912 |
age_group_5_65_over | 0.1531 | 0.118 | 1.295 | 0.199 | -0.082 | 0.388 |
Omnibus: | 3.628 | Durbin-Watson: | 1.942 |
---|---|---|---|
Prob(Omnibus): | 0.163 | Jarque-Bera (JB): | 3.556 |
Skew: | 0.422 | Prob(JB): | 0.169 |
Kurtosis: | 2.591 | Cond. No. | 1.19e+18 |
model39 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result39 = model39.fit()
result39.summary()
Dep. Variable: | performance_rating | R-squared: | 0.163 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.077 |
Method: | Least Squares | F-statistic: | 1.885 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.0647 |
Time: | 20:31:55 | Log-Likelihood: | -30.940 |
No. Observations: | 97 | AIC: | 81.88 |
Df Residuals: | 87 | BIC: | 107.6 |
Df Model: | 9 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 2.1715 | 0.027 | 80.597 | 0.000 | 2.118 | 2.225 |
race_grouping_white | 1.1795 | 0.038 | 31.191 | 0.000 | 1.104 | 1.255 |
race_grouping_person_of_color | 0.9920 | 0.050 | 19.838 | 0.000 | 0.893 | 1.091 |
age_group_5_25_under | -1.19e-16 | 5.67e-17 | -2.100 | 0.039 | -2.32e-16 | -6.38e-18 |
age_group_5_25to29 | 0.1278 | 0.095 | 1.343 | 0.183 | -0.061 | 0.317 |
age_group_5_30to34 | 0.1740 | 0.084 | 2.068 | 0.042 | 0.007 | 0.341 |
age_group_5_35to39 | 0.0399 | 0.104 | 0.386 | 0.701 | -0.166 | 0.246 |
age_group_5_40to44 | 0.2782 | 0.133 | 2.095 | 0.039 | 0.014 | 0.542 |
age_group_5_45to49 | 0.3886 | 0.098 | 3.983 | 0.000 | 0.195 | 0.583 |
age_group_5_50to54 | 0.1709 | 0.097 | 1.759 | 0.082 | -0.022 | 0.364 |
age_group_5_55to59 | 0.3288 | 0.094 | 3.486 | 0.001 | 0.141 | 0.516 |
age_group_5_60to64 | 0.4690 | 0.146 | 3.206 | 0.002 | 0.178 | 0.760 |
age_group_5_65_over | 0.1943 | 0.117 | 1.667 | 0.099 | -0.037 | 0.426 |
Omnibus: | 2.252 | Durbin-Watson: | 1.971 |
---|---|---|---|
Prob(Omnibus): | 0.324 | Jarque-Bera (JB): | 2.270 |
Skew: | 0.346 | Prob(JB): | 0.321 |
Kurtosis: | 2.713 | Cond. No. | 3.20e+16 |
model40 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result40 = model40.fit()
result40.summary()
Dep. Variable: | performance_rating | R-squared: | 0.175 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.079 |
Method: | Least Squares | F-statistic: | 1.820 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.0689 |
Time: | 20:31:55 | Log-Likelihood: | -30.271 |
No. Observations: | 97 | AIC: | 82.54 |
Df Residuals: | 86 | BIC: | 110.9 |
Df Model: | 10 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 1.6562 | 0.021 | 80.560 | 0.000 | 1.615 | 1.697 |
gender_Female | 0.8756 | 0.044 | 19.823 | 0.000 | 0.788 | 0.963 |
gender_Male | 0.7807 | 0.045 | 17.328 | 0.000 | 0.691 | 0.870 |
race_grouping_white | 0.9065 | 0.041 | 22.069 | 0.000 | 0.825 | 0.988 |
race_grouping_person_of_color | 0.7497 | 0.050 | 15.064 | 0.000 | 0.651 | 0.849 |
age_group_5_25_under | 6.399e-17 | 5.21e-17 | 1.228 | 0.223 | -3.96e-17 | 1.68e-16 |
age_group_5_25to29 | 0.0483 | 0.097 | 0.496 | 0.621 | -0.145 | 0.242 |
age_group_5_30to34 | 0.1150 | 0.084 | 1.365 | 0.176 | -0.052 | 0.282 |
age_group_5_35to39 | 0.0044 | 0.105 | 0.042 | 0.967 | -0.205 | 0.214 |
age_group_5_40to44 | 0.2123 | 0.133 | 1.599 | 0.113 | -0.052 | 0.476 |
age_group_5_45to49 | 0.3437 | 0.098 | 3.503 | 0.001 | 0.149 | 0.539 |
age_group_5_50to54 | 0.0997 | 0.098 | 1.018 | 0.312 | -0.095 | 0.294 |
age_group_5_55to59 | 0.2370 | 0.099 | 2.383 | 0.019 | 0.039 | 0.435 |
age_group_5_60to64 | 0.4766 | 0.158 | 3.026 | 0.003 | 0.164 | 0.790 |
age_group_5_65_over | 0.1193 | 0.118 | 1.014 | 0.313 | -0.114 | 0.353 |
Omnibus: | 2.988 | Durbin-Watson: | 1.997 |
---|---|---|---|
Prob(Omnibus): | 0.224 | Jarque-Bera (JB): | 2.992 |
Skew: | 0.402 | Prob(JB): | 0.224 |
Kurtosis: | 2.693 | Cond. No. | 3.83e+16 |
current_commercial_gender_salaried = commercial_salaried.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_commercial_gender_salaried)
count_nonzero | |
---|---|
gender | |
Female | 85 |
Male | 41 |
current_commercial_gender_hourly = commercial_hourly.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_commercial_gender_hourly)
count_nonzero | |
---|---|
gender | |
Female | 75 |
Male | 62 |
current_commercial_gender_salaried_median = commercial_salaried.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_gender_salaried_median)
count_nonzero | median | |
---|---|---|
gender | ||
Female | 85 | 90702.30 |
Male | 41 | 89382.07 |
current_commercial_gender_hourly_median = commercial_hourly.groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_gender_hourly_median)
count_nonzero | median | |
---|---|---|
gender | ||
Female | 75 | 31.57 |
Male | 62 | 26.78 |
current_commercial_gender_age_salaried = commercial_salaried.groupby(['gender'])['age'].median().sort_values(ascending=False)
current_commercial_gender_age_salaried
gender Male 40.00 Female 34.00 Name: age, dtype: float64
current_commercial_gender_age_hourly = commercial_hourly.groupby(['gender'])['age'].median().sort_values(ascending=False)
current_commercial_gender_age_hourly
gender Male 47.00 Female 40.00 Name: age, dtype: float64
current_commercial_gender_age_5_salary = commercial_salaried.groupby(['age_group_5','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_gender_age_5_salary)
count_nonzero | median | ||
---|---|---|---|
age_group_5 | gender | ||
25-29 | Female | 30.00 | 83389.64 |
Male | 9.00 | 90000.00 | |
30-34 | Female | 13.00 | 97695.60 |
35-39 | Female | 14.00 | 133890.00 |
Male | 5.00 | 80000.00 | |
45-49 | Female | 11.00 | 123000.00 |
50-54 | Male | 5.00 | 89382.07 |
55-59 | Female | 8.00 | 97813.00 |
Male | 6.00 | 92503.83 | |
65+ | Male | 6.00 | 93162.67 |
current_commercial_gender_age_5_hourly = commercial_hourly.groupby(['age_group_5','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_gender_age_5_hourly)
count_nonzero | median | ||
---|---|---|---|
age_group_5 | gender | ||
<25 | Female | 6 | 30.12 |
25-29 | Female | 18 | 35.13 |
Male | 10 | 26.98 | |
30-34 | Female | 9 | 33.33 |
40-44 | Male | 8 | 27.65 |
45-49 | Female | 10 | 32.27 |
Male | 9 | 25.41 | |
50-54 | Male | 5 | 24.24 |
55-59 | Female | 10 | 28.82 |
Male | 9 | 28.65 | |
60-64 | Female | 5 | 25.50 |
Male | 6 | 28.89 | |
65+ | Female | 6 | 31.26 |
Male | 6 | 26.70 |
current_commercial_gender_age_10_salary = commercial_salaried.groupby(['age_group_10','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_gender_age_10_salary)
count_nonzero | median | ||
---|---|---|---|
age_group_10 | gender | ||
25-34 | Female | 43.00 | 85000.00 |
Male | 10.00 | 91500.00 | |
35-44 | Female | 17.00 | 153000.00 |
Male | 8.00 | 89031.01 | |
45-54 | Female | 13.00 | 96170.00 |
Male | 6.00 | 89303.54 | |
55-64 | Female | 10.00 | 95154.04 |
Male | 7.00 | 95050.62 | |
65+ | Male | 6.00 | 93162.67 |
current_commercial_gender_age_10_hourly = commercial_hourly.groupby(['age_group_10','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_gender_age_10_hourly)
count_nonzero | median | ||
---|---|---|---|
age_group_10 | gender | ||
<25 | Female | 6 | 30.12 |
25-34 | Female | 27 | 34.61 |
Male | 14 | 26.98 | |
35-44 | Female | 8 | 32.83 |
Male | 12 | 28.85 | |
45-54 | Female | 13 | 31.06 |
Male | 14 | 25.09 | |
55-64 | Female | 15 | 26.83 |
Male | 15 | 28.65 | |
65+ | Female | 6 | 31.26 |
Male | 6 | 26.70 |
current_commercial_gender_salaried_under_40 = commercial_salaried[commercial_salaried['age'] < 40].groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_gender_salaried_under_40)
count_nonzero | median | |
---|---|---|
gender | ||
Female | 59 | 90000.00 |
Male | 19 | 84780.00 |
current_commercial_gender_salaried_over_40 = commercial_salaried[commercial_salaried['age'] > 39].groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_gender_salaried_over_40)
count_nonzero | median | |
---|---|---|
gender | ||
Female | 26 | 98381.29 |
Male | 22 | 92236.74 |
current_commercial_gender_hourly_under_40 = commercial_hourly[commercial_hourly['age'] < 40].groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_gender_hourly_under_40)
count_nonzero | median | |
---|---|---|
gender | ||
Female | 37 | 33.11 |
Male | 19 | 27.79 |
current_commercial_gender_hourly_over_40 = commercial_hourly[commercial_hourly['age'] > 39].groupby(['gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_gender_hourly_over_40)
count_nonzero | median | |
---|---|---|
gender | ||
Female | 38 | 29.36 |
Male | 43 | 26.75 |
current_commercial_race_salaried = commercial_salaried.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_commercial_race_salaried)
count_nonzero | |
---|---|
race_ethnicity | |
White (United States of America) | 88 |
Black or African American (United States of America) | 16 |
Asian (United States of America) | 13 |
Hispanic or Latino (United States of America) | 5 |
current_commercial_race_hourly = commercial_hourly.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_commercial_race_hourly)
count_nonzero | |
---|---|
race_ethnicity | |
Black or African American (United States of America) | 75 |
White (United States of America) | 41 |
Asian (United States of America) | 8 |
Hispanic or Latino (United States of America) | 6 |
current_commercial_race_group_salaried = commercial_salaried.groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_commercial_race_group_salaried)
count_nonzero | |
---|---|
race_grouping | |
white | 88 |
person of color | 37 |
current_commercial_race_group_hourly = commercial_hourly.groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero]})
suppress_count(current_commercial_race_group_hourly)
count_nonzero | |
---|---|
race_grouping | |
person of color | 94 |
white | 41 |
current_commercial_race_median_salaried = commercial_salaried.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_race_median_salaried)
count_nonzero | median | |
---|---|---|
race_ethnicity | ||
White (United States of America) | 88 | 95415.31 |
Black or African American (United States of America) | 16 | 83584.64 |
Hispanic or Latino (United States of America) | 5 | 83048.54 |
Asian (United States of America) | 13 | 80000.00 |
current_commercial_race_median_hourly = commercial_hourly.groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_race_median_hourly)
count_nonzero | median | |
---|---|---|
race_ethnicity | ||
White (United States of America) | 41 | 32.36 |
Asian (United States of America) | 8 | 29.12 |
Black or African American (United States of America) | 75 | 28.23 |
Hispanic or Latino (United States of America) | 6 | 25.06 |
current_commercial_race_group_median_salaried = commercial_salaried.groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_race_group_median_salaried)
count_nonzero | median | |
---|---|---|
race_grouping | ||
white | 88 | 95415.31 |
person of color | 37 | 80000.56 |
current_commercial_race_group_median_hourly = commercial_hourly.groupby(['race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_race_group_median_hourly)
count_nonzero | median | |
---|---|---|
race_grouping | ||
white | 41 | 32.36 |
person of color | 94 | 28.42 |
current_commercial_race_age_salaried = commercial_salaried.groupby(['race_ethnicity'])['age'].median().sort_values(ascending=False)
current_commercial_race_age_salaried
race_ethnicity Black or African American (United States of America) 48.00 White (United States of America) 36.00 Prefer Not to Disclose (United States of America) 33.00 Asian (United States of America) 32.00 Hispanic or Latino (United States of America) 29.00 Two or More Races (United States of America) 28.00 Name: age, dtype: float64
current_commercial_race_age_hourly = commercial_hourly.groupby(['race_ethnicity'])['age'].median().sort_values(ascending=False)
current_commercial_race_age_hourly
race_ethnicity Black or African American (United States of America) 48.00 American Indian or Alaska Native (United States of America) 40.00 White (United States of America) 40.00 Prefer Not to Disclose (United States of America) 34.00 Two or More Races (United States of America) 33.00 Hispanic or Latino (United States of America) 30.00 Asian (United States of America) 28.50 Name: age, dtype: float64
current_commercial_race_age_5_salary = commercial_salaried.groupby(['age_group_5','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_age_5_salary)
count_nonzero | median | ||
---|---|---|---|
age_group_5 | race_ethnicity | ||
<25 | White (United States of America) | 5.00 | 80780.00 |
25-29 | Asian (United States of America) | 6.00 | 77625.00 |
White (United States of America) | 26.00 | 86356.50 | |
30-34 | White (United States of America) | 10.00 | 98847.80 |
35-39 | White (United States of America) | 15.00 | 155780.00 |
45-49 | Black or African American (United States of America) | 5.00 | 89225.00 |
White (United States of America) | 6.00 | 151956.75 | |
50-54 | White (United States of America) | 5.00 | 89382.07 |
55-59 | White (United States of America) | 12.00 | 92503.83 |
current_commercial_race_age_5_hourly = commercial_hourly.groupby(['age_group_5','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_age_5_hourly)
count_nonzero | median | ||
---|---|---|---|
age_group_5 | race_ethnicity | ||
25-29 | Black or African American (United States of America) | 8.00 | 29.94 |
White (United States of America) | 13.00 | 35.90 | |
30-34 | Black or African American (United States of America) | 5.00 | 27.88 |
35-39 | Black or African American (United States of America) | 5.00 | 32.51 |
40-44 | Black or African American (United States of America) | 5.00 | 28.96 |
45-49 | Black or African American (United States of America) | 15.00 | 30.14 |
50-54 | Black or African American (United States of America) | 6.00 | 23.99 |
55-59 | Black or African American (United States of America) | 14.00 | 28.99 |
White (United States of America) | 5.00 | 27.98 | |
60-64 | Black or African American (United States of America) | 8.00 | 25.75 |
65+ | Black or African American (United States of America) | 6.00 | 26.70 |
current_commercial_race_age_10_salary = commercial_salaried.groupby(['age_group_10','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_age_10_salary)
count_nonzero | median | ||
---|---|---|---|
age_group_10 | race_ethnicity | ||
<25 | White (United States of America) | 5.00 | 80780.00 |
25-34 | Asian (United States of America) | 9.00 | 80000.00 |
White (United States of America) | 36.00 | 89181.30 | |
35-44 | White (United States of America) | 19.00 | 153000.00 |
45-54 | Black or African American (United States of America) | 7.00 | 83000.00 |
White (United States of America) | 11.00 | 141678.09 | |
55-64 | White (United States of America) | 14.00 | 96872.70 |
current_commercial_race_age_10_hourly = commercial_hourly.groupby(['age_group_10','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_age_10_hourly)
count_nonzero | median | ||
---|---|---|---|
age_group_10 | race_ethnicity | ||
25-34 | Black or African American (United States of America) | 13.00 | 29.74 |
White (United States of America) | 16.00 | 35.38 | |
35-44 | Black or African American (United States of America) | 10.00 | 30.96 |
White (United States of America) | 5.00 | 31.57 | |
45-54 | Black or African American (United States of America) | 21.00 | 27.79 |
White (United States of America) | 5.00 | 35.66 | |
55-64 | Black or African American (United States of America) | 22.00 | 26.81 |
White (United States of America) | 8.00 | 28.31 | |
65+ | Black or African American (United States of America) | 6.00 | 26.70 |
current_commercial_race_group_age_5_salary = commercial_salaried.groupby(['age_group_5','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_group_age_5_salary)
count_nonzero | median | ||
---|---|---|---|
age_group_5 | race_grouping | ||
<25 | white | 5.00 | 80780.00 |
25-29 | person of color | 13.00 | 78500.00 |
white | 26.00 | 86356.50 | |
30-34 | white | 10.00 | 98847.80 |
35-39 | white | 15.00 | 155780.00 |
45-49 | person of color | 6.00 | 86112.50 |
white | 6.00 | 151956.75 | |
50-54 | white | 5.00 | 89382.07 |
55-59 | white | 12.00 | 92503.83 |
current_commercial_race_group_age_5_hourly = commercial_hourly.groupby(['age_group_5','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_group_age_5_hourly)
count_nonzero | median | ||
---|---|---|---|
age_group_5 | race_grouping | ||
25-29 | person of color | 14.00 | 29.68 |
white | 13.00 | 35.90 | |
30-34 | person of color | 10.00 | 29.07 |
35-39 | person of color | 7.00 | 32.51 |
40-44 | person of color | 7.00 | 28.96 |
45-49 | person of color | 16.00 | 28.96 |
50-54 | person of color | 6.00 | 23.99 |
55-59 | person of color | 14.00 | 28.99 |
white | 5.00 | 27.98 | |
60-64 | person of color | 8.00 | 25.75 |
65+ | person of color | 8.00 | 26.78 |
current_commercial_race_group_age_10_salary = commercial_salaried.groupby(['age_group_10','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_group_age_10_salary)
count_nonzero | median | ||
---|---|---|---|
age_group_10 | race_grouping | ||
<25 | white | 5.00 | 80780.00 |
25-34 | person of color | 16.00 | 80000.00 |
white | 36.00 | 89181.30 | |
35-44 | person of color | 6.00 | 87845.99 |
white | 19.00 | 153000.00 | |
45-54 | person of color | 8.00 | 81500.28 |
white | 11.00 | 141678.09 | |
55-64 | white | 14.00 | 96872.70 |
current_commercial_race_group_age_10_hourly = commercial_hourly.groupby(['age_group_10','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_group_age_10_hourly)
count_nonzero | median | ||
---|---|---|---|
age_group_10 | race_grouping | ||
25-34 | person of color | 24.00 | 29.68 |
white | 16.00 | 35.38 | |
35-44 | person of color | 14.00 | 30.96 |
white | 5.00 | 31.57 | |
45-54 | person of color | 22.00 | 26.60 |
white | 5.00 | 35.66 | |
55-64 | person of color | 22.00 | 26.81 |
white | 8.00 | 28.31 | |
65+ | person of color | 8.00 | 26.78 |
current_commercial_race_under_40_salaried = commercial_salaried[commercial_salaried['age'] < 40].groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_race_under_40_salaried)
count_nonzero | median | |
---|---|---|
race_ethnicity | ||
White (United States of America) | 56 | 93852.75 |
Asian (United States of America) | 12 | 80000.00 |
current_commercial_race_over_40_salaried = commercial_salaried[commercial_salaried['age'] > 39].groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_race_over_40_salaried)
count_nonzero | median | |
---|---|---|
race_ethnicity | ||
White (United States of America) | 32 | 104022.79 |
Black or African American (United States of America) | 12 | 85573.50 |
current_commercial_race_under_40_hourly = commercial_hourly[commercial_hourly['age'] < 40].groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_race_under_40_hourly)
count_nonzero | median | |
---|---|---|
race_ethnicity | ||
White (United States of America) | 20 | 34.74 |
Black or African American (United States of America) | 21 | 30.19 |
Asian (United States of America) | 5 | 29.63 |
Hispanic or Latino (United States of America) | 5 | 23.56 |
current_commercial_race_over_40_hourly = commercial_hourly[commercial_hourly['age'] > 39].groupby(['race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_race_over_40_hourly)
count_nonzero | median | |
---|---|---|
race_ethnicity | ||
White (United States of America) | 21 | 31.29 |
Black or African American (United States of America) | 54 | 26.81 |
current_commercial_race_gender_salaried = commercial_salaried.groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_commercial_race_gender_salaried)
count_nonzero | ||
---|---|---|
race_ethnicity | gender | |
Asian (United States of America) | Female | 8 |
Male | 5 | |
Black or African American (United States of America) | Female | 9 |
Male | 7 | |
White (United States of America) | Female | 62 |
Male | 26 |
current_commercial_race_gender_hourly = commercial_hourly.groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero]})
suppress(current_commercial_race_gender_hourly)
count_nonzero | ||
---|---|---|
race_ethnicity | gender | |
Asian (United States of America) | Female | 5 |
Black or African American (United States of America) | Female | 40 |
Male | 35 | |
White (United States of America) | Female | 23 |
Male | 18 |
current_commercial_race_gender_median_salaried = commercial_salaried.groupby(['race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_gender_median_salaried)
count_nonzero | median | ||
---|---|---|---|
race_grouping | gender | ||
person of color | Female | 22 | 80879.99 |
Male | 15 | 80000.56 | |
white | Female | 62 | 98047.80 |
Male | 26 | 94693.21 |
current_commercial_race_gender_median_hourly = commercial_hourly.groupby(['race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_gender_median_hourly)
count_nonzero | median | ||
---|---|---|---|
race_grouping | gender | ||
person of color | Female | 52 | 29.70 |
Male | 42 | 26.70 | |
white | Female | 23 | 34.61 |
Male | 18 | 28.70 |
current_commercial_race_gender_under_40_salaried = commercial_salaried[commercial_salaried['age'] < 40].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_gender_under_40_salaried)
count_nonzero | median | ||
---|---|---|---|
race_ethnicity | gender | ||
Asian (United States of America) | Female | 8 | 81350.05 |
White (United States of America) | Female | 45 | 94202.10 |
Male | 11 | 93000.00 |
current_commercial_race_gender_under_40_hourly = commercial_hourly[commercial_hourly['age'] < 40].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_gender_under_40_hourly)
count_nonzero | median | ||
---|---|---|---|
race_ethnicity | gender | ||
Black or African American (United States of America) | Female | 14 | 30.67 |
Male | 7 | 29.74 | |
White (United States of America) | Female | 14 | 35.38 |
Male | 6 | 29.26 |
current_commercial_race_gender_over_40_salaried = commercial_salaried[commercial_salaried['age'] > 39].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_gender_over_40_salaried)
count_nonzero | median | ||
---|---|---|---|
race_ethnicity | gender | ||
Black or African American (United States of America) | Female | 7 | 83000.00 |
Male | 5 | 86977.73 | |
White (United States of America) | Female | 17 | 123000.00 |
Male | 15 | 95050.62 |
current_commercial_race_gender_over_40_hourly = commercial_hourly[commercial_hourly['age'] > 39].groupby(['race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_race_gender_over_40_hourly)
count_nonzero | median | ||
---|---|---|---|
race_ethnicity | gender | ||
Black or African American (United States of America) | Female | 26 | 27.79 |
Male | 28 | 26.66 | |
White (United States of America) | Female | 9 | 33.46 |
Male | 12 | 28.70 |
current_commercial_yos_salary = commercial_salaried.groupby(['years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_yos_salary)
count_nonzero | median | |
---|---|---|
years_of_service_grouped | ||
0 | 14 | 84890.00 |
1-2 | 43 | 87075.00 |
3-5 | 24 | 99107.68 |
6-10 | 18 | 134000.05 |
11-15 | 7 | 83000.00 |
16-20 | 7 | 89957.05 |
21-25 | 7 | 86977.73 |
25+ | 6 | 89609.95 |
current_commercial_yos_hourly = commercial_hourly.groupby(['years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_yos_hourly)
count_nonzero | median | |
---|---|---|
years_of_service_grouped | ||
0 | 16 | 31.92 |
1-2 | 31 | 29.63 |
3-5 | 22 | 27.88 |
6-10 | 16 | 27.31 |
11-15 | 16 | 28.85 |
16-20 | 13 | 28.61 |
21-25 | 11 | 29.74 |
25+ | 12 | 28.64 |
current_commercial_yos_gender_salary = commercial_salaried.groupby(['years_of_service_grouped','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_yos_gender_salary)
count_nonzero | median | ||
---|---|---|---|
years_of_service_grouped | gender | ||
0 | Female | 8 | 84890.00 |
Male | 6 | 82500.00 | |
1-2 | Female | 34 | 86877.54 |
Male | 9 | 89382.07 | |
3-5 | Female | 19 | 99465.35 |
Male | 5 | 98750.00 | |
6-10 | Female | 11 | 112000.00 |
Male | 7 | 144476.99 | |
16-20 | Female | 6 | 96685.35 |
21-25 | Male | 6 | 82338.57 |
current_commercial_yos_gender_hourly = commercial_hourly.groupby(['years_of_service_grouped','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_yos_gender_hourly)
count_nonzero | median | ||
---|---|---|---|
years_of_service_grouped | gender | ||
0 | Female | 10.00 | 34.10 |
Male | 6.00 | 27.32 | |
1-2 | Female | 17.00 | 31.06 |
Male | 14.00 | 24.62 | |
3-5 | Female | 12.00 | 33.09 |
Male | 10.00 | 25.57 | |
6-10 | Male | 12.00 | 26.61 |
11-15 | Male | 12.00 | 28.73 |
16-20 | Female | 8.00 | 27.33 |
Male | 5.00 | 32.36 | |
21-25 | Female | 8.00 | 28.55 |
25+ | Female | 12.00 | 28.64 |
current_commercial_yos_race_salary = commercial_salaried.groupby(['years_of_service_grouped','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_yos_race_salary)
count_nonzero | median | ||
---|---|---|---|
years_of_service_grouped | race_ethnicity | ||
0 | White (United States of America) | 8.00 | 92890.00 |
1-2 | Asian (United States of America) | 7.00 | 77655.00 |
White (United States of America) | 31.00 | 89382.07 | |
3-5 | White (United States of America) | 18.00 | 99732.68 |
6-10 | White (United States of America) | 15.00 | 144476.99 |
16-20 | White (United States of America) | 5.00 | 114610.73 |
21-25 | White (United States of America) | 5.00 | 99347.61 |
current_commercial_yos_race_hourly = commercial_hourly.groupby(['years_of_service_grouped','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_yos_race_hourly)
count_nonzero | median | ||
---|---|---|---|
years_of_service_grouped | race_ethnicity | ||
0 | Black or African American (United States of America) | 8.00 | 29.69 |
White (United States of America) | 7.00 | 35.90 | |
1-2 | Asian (United States of America) | 5.00 | 29.63 |
Black or African American (United States of America) | 13.00 | 29.18 | |
White (United States of America) | 7.00 | 33.11 | |
3-5 | Black or African American (United States of America) | 11.00 | 26.57 |
White (United States of America) | 9.00 | 31.57 | |
6-10 | Black or African American (United States of America) | 9.00 | 26.75 |
White (United States of America) | 5.00 | 31.29 | |
11-15 | Black or African American (United States of America) | 8.00 | 30.05 |
White (United States of America) | 6.00 | 28.70 | |
16-20 | Black or African American (United States of America) | 7.00 | 25.50 |
21-25 | Black or African American (United States of America) | 10.00 | 30.91 |
25+ | Black or African American (United States of America) | 9.00 | 26.79 |
current_commercial_yos_race_gender_salary = commercial_salaried.groupby(['years_of_service_grouped','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_yos_race_gender_salary)
count_nonzero | median | |||
---|---|---|---|---|
years_of_service_grouped | race_grouping | gender | ||
0 | white | Female | 5.00 | 110000.00 |
1-2 | person of color | Female | 10.00 | 77717.50 |
white | Female | 23.00 | 88362.60 | |
Male | 8.00 | 91191.04 | ||
3-5 | white | Female | 16.00 | 109732.68 |
6-10 | white | Female | 10.00 | 125000.27 |
Male | 5.00 | 155942.78 |
current_commercial_yos_race_gender_hourly = commercial_hourly.groupby(['years_of_service_grouped','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_yos_race_gender_hourly)
count_nonzero | median | |||
---|---|---|---|---|
years_of_service_grouped | race_grouping | gender | ||
0 | person of color | Male | 5.00 | 25.41 |
white | Female | 6.00 | 35.38 | |
1-2 | person of color | Female | 12.00 | 30.16 |
Male | 10.00 | 25.41 | ||
white | Female | 5.00 | 33.11 | |
3-5 | person of color | Female | 6.00 | 27.33 |
Male | 7.00 | 26.57 | ||
white | Female | 6.00 | 36.11 | |
6-10 | person of color | Male | 8.00 | 26.61 |
11-15 | person of color | Male | 6.00 | 28.84 |
white | Male | 6.00 | 28.70 | |
16-20 | person of color | Female | 6.00 | 27.05 |
21-25 | person of color | Female | 8.00 | 28.55 |
25+ | person of color | Female | 9.00 | 26.79 |
current_median_commercial_age_5_salaried = commercial_salaried.groupby(['age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_salaried)
count_nonzero | median | |
---|---|---|
age_group_5 | ||
<25 | 6 | 80390.00 |
25-29 | 39 | 84780.00 |
30-34 | 14 | 98847.80 |
35-39 | 19 | 112000.00 |
40-44 | 6 | 125875.00 |
45-49 | 12 | 109585.00 |
50-54 | 7 | 80000.56 |
55-59 | 14 | 92503.83 |
65+ | 6 | 93162.67 |
current_median_commercial_age_5_hourly = commercial_hourly.groupby(['age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_hourly)
count_nonzero | median | |
---|---|---|
age_group_5 | ||
<25 | 7 | 30.06 |
25-29 | 28 | 31.61 |
30-34 | 13 | 33.11 |
35-39 | 8 | 32.44 |
40-44 | 12 | 28.85 |
45-49 | 19 | 31.06 |
50-54 | 8 | 24.20 |
55-59 | 19 | 28.65 |
60-64 | 11 | 26.00 |
65+ | 12 | 27.52 |
current_median_commercial_age_10_salaried = commercial_salaried.groupby(['age_group_10']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_salaried)
count_nonzero | median | |
---|---|---|
age_group_10 | ||
<25 | 6 | 80390.00 |
25-34 | 53 | 85045.10 |
35-44 | 25 | 112000.00 |
45-54 | 19 | 94516.42 |
55-64 | 17 | 95050.62 |
65+ | 6 | 93162.67 |
current_median_commercial_age_10_hourly = commercial_hourly.groupby(['age_group_10']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_hourly)
count_nonzero | median | |
---|---|---|
age_group_10 | ||
<25 | 7 | 30.06 |
25-34 | 41 | 32.05 |
35-44 | 20 | 30.96 |
45-54 | 27 | 28.61 |
55-64 | 30 | 27.14 |
65+ | 12 | 27.52 |
current_commercial_age_5_yos_salary = commercial_salaried.groupby(['age_group_5','years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_age_5_yos_salary)
count_nonzero | median | ||
---|---|---|---|
age_group_5 | years_of_service_grouped | ||
25-29 | 0 | 9.00 | 84780.00 |
1-2 | 22.00 | 81217.50 | |
3-5 | 7.00 | 83730.75 | |
30-34 | 1-2 | 7.00 | 90702.30 |
3-5 | 5.00 | 109568.81 | |
35-39 | 6-10 | 8.00 | 133890.00 |
current_commercial_age_5_yos_hourly = commercial_hourly.groupby(['age_group_5','years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_age_5_yos_hourly)
count_nonzero | median | ||
---|---|---|---|
age_group_5 | years_of_service_grouped | ||
25-29 | 0 | 9.00 | 35.90 |
1-2 | 12.00 | 30.45 | |
3-5 | 7.00 | 27.79 | |
30-34 | 1-2 | 5.00 | 33.11 |
55-59 | 3-5 | 5.00 | 27.98 |
25+ | 5.00 | 32.21 |
current_commercial_age_10_yos_salary = commercial_salaried.groupby(['age_group_10','years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_age_10_yos_salary)
count_nonzero | median | ||
---|---|---|---|
age_group_10 | years_of_service_grouped | ||
25-34 | 0 | 10.00 | 84780.00 |
1-2 | 29.00 | 85045.10 | |
3-5 | 12.00 | 97232.68 | |
35-44 | 3-5 | 5.00 | 153000.00 |
6-10 | 9.00 | 155780.00 | |
45-54 | 1-2 | 5.00 | 94516.42 |
current_commercial_age_10_yos_hourly = commercial_hourly.groupby(['age_group_10','years_of_service_grouped']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_commercial_age_10_yos_hourly)
count_nonzero | median | ||
---|---|---|---|
age_group_10 | years_of_service_grouped | ||
25-34 | 0 | 13.00 | 33.33 |
1-2 | 17.00 | 31.17 | |
3-5 | 9.00 | 27.79 | |
35-44 | 11-15 | 7.00 | 32.12 |
45-54 | 1-2 | 5.00 | 23.83 |
6-10 | 6.00 | 24.67 | |
21-25 | 5.00 | 32.47 | |
55-64 | 3-5 | 6.00 | 26.49 |
16-20 | 5.00 | 26.00 | |
21-25 | 5.00 | 27.35 | |
25+ | 7.00 | 26.79 |
current_median_commercial_age_5_gender_salaried = commercial_salaried.groupby(['age_group_5','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_gender_salaried)
count_nonzero | median | ||
---|---|---|---|
age_group_5 | gender | ||
25-29 | Female | 30.00 | 83389.64 |
Male | 9.00 | 90000.00 | |
30-34 | Female | 13.00 | 97695.60 |
35-39 | Female | 14.00 | 133890.00 |
Male | 5.00 | 80000.00 | |
45-49 | Female | 11.00 | 123000.00 |
50-54 | Male | 5.00 | 89382.07 |
55-59 | Female | 8.00 | 97813.00 |
Male | 6.00 | 92503.83 | |
65+ | Male | 6.00 | 93162.67 |
current_median_commercial_age_5_gender_hourly = commercial_hourly.groupby(['age_group_5','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_gender_hourly)
count_nonzero | median | ||
---|---|---|---|
age_group_5 | gender | ||
<25 | Female | 6 | 30.12 |
25-29 | Female | 18 | 35.13 |
Male | 10 | 26.98 | |
30-34 | Female | 9 | 33.33 |
40-44 | Male | 8 | 27.65 |
45-49 | Female | 10 | 32.27 |
Male | 9 | 25.41 | |
50-54 | Male | 5 | 24.24 |
55-59 | Female | 10 | 28.82 |
Male | 9 | 28.65 | |
60-64 | Female | 5 | 25.50 |
Male | 6 | 28.89 | |
65+ | Female | 6 | 31.26 |
Male | 6 | 26.70 |
current_median_commercial_age_10_gender_salaried = commercial_salaried.groupby(['age_group_10','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_gender_salaried)
count_nonzero | median | ||
---|---|---|---|
age_group_10 | gender | ||
25-34 | Female | 43.00 | 85000.00 |
Male | 10.00 | 91500.00 | |
35-44 | Female | 17.00 | 153000.00 |
Male | 8.00 | 89031.01 | |
45-54 | Female | 13.00 | 96170.00 |
Male | 6.00 | 89303.54 | |
55-64 | Female | 10.00 | 95154.04 |
Male | 7.00 | 95050.62 | |
65+ | Male | 6.00 | 93162.67 |
current_median_commercial_age_10_gender_hourly = commercial_hourly.groupby(['age_group_10','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_gender_hourly)
count_nonzero | median | ||
---|---|---|---|
age_group_10 | gender | ||
<25 | Female | 6 | 30.12 |
25-34 | Female | 27 | 34.61 |
Male | 14 | 26.98 | |
35-44 | Female | 8 | 32.83 |
Male | 12 | 28.85 | |
45-54 | Female | 13 | 31.06 |
Male | 14 | 25.09 | |
55-64 | Female | 15 | 26.83 |
Male | 15 | 28.65 | |
65+ | Female | 6 | 31.26 |
Male | 6 | 26.70 |
current_median_commercial_age_5_race_salaried = commercial_salaried.groupby(['age_group_5','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_race_salaried)
count_nonzero | median | ||
---|---|---|---|
age_group_5 | race_ethnicity | ||
<25 | White (United States of America) | 5.00 | 80780.00 |
25-29 | Asian (United States of America) | 6.00 | 77625.00 |
White (United States of America) | 26.00 | 86356.50 | |
30-34 | White (United States of America) | 10.00 | 98847.80 |
35-39 | White (United States of America) | 15.00 | 155780.00 |
45-49 | Black or African American (United States of America) | 5.00 | 89225.00 |
White (United States of America) | 6.00 | 151956.75 | |
50-54 | White (United States of America) | 5.00 | 89382.07 |
55-59 | White (United States of America) | 12.00 | 92503.83 |
current_median_commercial_age_5_race_hourly = commercial_hourly.groupby(['age_group_5','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_race_hourly)
count_nonzero | median | ||
---|---|---|---|
age_group_5 | race_ethnicity | ||
25-29 | Black or African American (United States of America) | 8.00 | 29.94 |
White (United States of America) | 13.00 | 35.90 | |
30-34 | Black or African American (United States of America) | 5.00 | 27.88 |
35-39 | Black or African American (United States of America) | 5.00 | 32.51 |
40-44 | Black or African American (United States of America) | 5.00 | 28.96 |
45-49 | Black or African American (United States of America) | 15.00 | 30.14 |
50-54 | Black or African American (United States of America) | 6.00 | 23.99 |
55-59 | Black or African American (United States of America) | 14.00 | 28.99 |
White (United States of America) | 5.00 | 27.98 | |
60-64 | Black or African American (United States of America) | 8.00 | 25.75 |
65+ | Black or African American (United States of America) | 6.00 | 26.70 |
current_median_commercial_age_5_race_group_salaried = commercial_salaried.groupby(['age_group_5','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_race_group_salaried)
count_nonzero | median | ||
---|---|---|---|
age_group_5 | race_grouping | ||
<25 | white | 5.00 | 80780.00 |
25-29 | person of color | 13.00 | 78500.00 |
white | 26.00 | 86356.50 | |
30-34 | white | 10.00 | 98847.80 |
35-39 | white | 15.00 | 155780.00 |
45-49 | person of color | 6.00 | 86112.50 |
white | 6.00 | 151956.75 | |
50-54 | white | 5.00 | 89382.07 |
55-59 | white | 12.00 | 92503.83 |
current_median_commercial_age_5_race_group_hourly = commercial_hourly.groupby(['age_group_5','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_race_group_hourly)
count_nonzero | median | ||
---|---|---|---|
age_group_5 | race_grouping | ||
25-29 | person of color | 14.00 | 29.68 |
white | 13.00 | 35.90 | |
30-34 | person of color | 10.00 | 29.07 |
35-39 | person of color | 7.00 | 32.51 |
40-44 | person of color | 7.00 | 28.96 |
45-49 | person of color | 16.00 | 28.96 |
50-54 | person of color | 6.00 | 23.99 |
55-59 | person of color | 14.00 | 28.99 |
white | 5.00 | 27.98 | |
60-64 | person of color | 8.00 | 25.75 |
65+ | person of color | 8.00 | 26.78 |
current_median_commercial_age_10_race_salaried = commercial_salaried.groupby(['age_group_10','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_race_salaried)
count_nonzero | median | ||
---|---|---|---|
age_group_10 | race_ethnicity | ||
<25 | White (United States of America) | 5.00 | 80780.00 |
25-34 | Asian (United States of America) | 9.00 | 80000.00 |
White (United States of America) | 36.00 | 89181.30 | |
35-44 | White (United States of America) | 19.00 | 153000.00 |
45-54 | Black or African American (United States of America) | 7.00 | 83000.00 |
White (United States of America) | 11.00 | 141678.09 | |
55-64 | White (United States of America) | 14.00 | 96872.70 |
current_median_commercial_age_10_race_hourly = commercial_hourly.groupby(['age_group_10','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_race_hourly)
count_nonzero | median | ||
---|---|---|---|
age_group_10 | race_ethnicity | ||
25-34 | Black or African American (United States of America) | 13.00 | 29.74 |
White (United States of America) | 16.00 | 35.38 | |
35-44 | Black or African American (United States of America) | 10.00 | 30.96 |
White (United States of America) | 5.00 | 31.57 | |
45-54 | Black or African American (United States of America) | 21.00 | 27.79 |
White (United States of America) | 5.00 | 35.66 | |
55-64 | Black or African American (United States of America) | 22.00 | 26.81 |
White (United States of America) | 8.00 | 28.31 | |
65+ | Black or African American (United States of America) | 6.00 | 26.70 |
current_median_commercial_age_10_race_group_salaried = commercial_salaried.groupby(['age_group_10','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_race_group_salaried)
count_nonzero | median | ||
---|---|---|---|
age_group_10 | race_grouping | ||
<25 | white | 5.00 | 80780.00 |
25-34 | person of color | 16.00 | 80000.00 |
white | 36.00 | 89181.30 | |
35-44 | person of color | 6.00 | 87845.99 |
white | 19.00 | 153000.00 | |
45-54 | person of color | 8.00 | 81500.28 |
white | 11.00 | 141678.09 | |
55-64 | white | 14.00 | 96872.70 |
current_median_commercial_age_10_race_group_hourly = commercial_hourly.groupby(['age_group_10','race_grouping']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_race_group_hourly)
count_nonzero | median | ||
---|---|---|---|
age_group_10 | race_grouping | ||
25-34 | person of color | 24.00 | 29.68 |
white | 16.00 | 35.38 | |
35-44 | person of color | 14.00 | 30.96 |
white | 5.00 | 31.57 | |
45-54 | person of color | 22.00 | 26.60 |
white | 5.00 | 35.66 | |
55-64 | person of color | 22.00 | 26.81 |
white | 8.00 | 28.31 | |
65+ | person of color | 8.00 | 26.78 |
current_median_commercial_age_5_race_gender_salaried = commercial_salaried.groupby(['age_group_5','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_race_gender_salaried)
count_nonzero | median | |||
---|---|---|---|---|
age_group_5 | race_ethnicity | gender | ||
25-29 | Asian (United States of America) | Female | 5.00 | 77595.00 |
White (United States of America) | Female | 20.00 | 84809.55 | |
Male | 6.00 | 93935.00 | ||
30-34 | White (United States of America) | Female | 9.00 | 97695.60 |
35-39 | White (United States of America) | Female | 14.00 | 133890.00 |
45-49 | White (United States of America) | Female | 6.00 | 151956.75 |
55-59 | White (United States of America) | Female | 6.00 | 97813.00 |
Male | 6.00 | 92503.83 |
current_median_commercial_age_5_race_gender_hourly = commercial_hourly.groupby(['age_group_5','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_race_gender_hourly)
count_nonzero | median | |||
---|---|---|---|---|
age_group_5 | race_ethnicity | gender | ||
25-29 | Black or African American (United States of America) | Female | 6.00 | 32.90 |
White (United States of America) | Female | 8.00 | 37.77 | |
Male | 5.00 | 26.16 | ||
45-49 | Black or African American (United States of America) | Female | 6.00 | 31.57 |
Male | 9.00 | 25.41 | ||
55-59 | Black or African American (United States of America) | Female | 9.00 | 29.66 |
Male | 5.00 | 28.97 | ||
60-64 | Black or African American (United States of America) | Female | 5.00 | 25.50 |
current_median_commercial_age_5_race_group_gender_salaried = commercial_salaried.groupby(['age_group_5','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_race_group_gender_salaried)
count_nonzero | median | |||
---|---|---|---|---|
age_group_5 | race_grouping | gender | ||
25-29 | person of color | Female | 10.00 | 77625.00 |
white | Female | 20.00 | 84809.55 | |
Male | 6.00 | 93935.00 | ||
30-34 | white | Female | 9.00 | 97695.60 |
35-39 | white | Female | 14.00 | 133890.00 |
45-49 | person of color | Female | 5.00 | 83000.00 |
white | Female | 6.00 | 151956.75 | |
55-59 | white | Female | 6.00 | 97813.00 |
Male | 6.00 | 92503.83 |
current_median_commercial_age_5_race_group_gender_hourly = commercial_hourly.groupby(['age_group_5','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_5_race_group_gender_hourly)
count_nonzero | median | |||
---|---|---|---|---|
age_group_5 | race_grouping | gender | ||
25-29 | person of color | Female | 10.00 | 30.66 |
white | Female | 8.00 | 37.77 | |
Male | 5.00 | 26.16 | ||
30-34 | person of color | Female | 6.00 | 30.60 |
40-44 | person of color | Male | 5.00 | 26.56 |
45-49 | person of color | Female | 7.00 | 31.06 |
Male | 9.00 | 25.41 | ||
55-59 | person of color | Female | 9.00 | 29.66 |
Male | 5.00 | 28.97 | ||
60-64 | person of color | Female | 5.00 | 25.50 |
65+ | person of color | Male | 5.00 | 26.75 |
current_median_commercial_age_10_race_gender_salaried = commercial_salaried.groupby(['age_group_10','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_race_gender_salaried)
count_nonzero | median | |||
---|---|---|---|---|
age_group_10 | race_ethnicity | gender | ||
25-34 | Asian (United States of America) | Female | 8.00 | 81350.05 |
White (United States of America) | Female | 29.00 | 86680.09 | |
Male | 7.00 | 94870.00 | ||
35-44 | White (United States of America) | Female | 17.00 | 153000.00 |
45-54 | Black or African American (United States of America) | Female | 5.00 | 83000.00 |
White (United States of America) | Female | 7.00 | 148624.00 | |
55-64 | White (United States of America) | Female | 7.00 | 100592.59 |
Male | 7.00 | 95050.62 |
current_median_commercial_age_10_race_gender_hourly = commercial_hourly.groupby(['age_group_10','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_race_gender_hourly)
count_nonzero | median | |||
---|---|---|---|---|
age_group_10 | race_ethnicity | gender | ||
25-34 | Black or African American (United States of America) | Female | 9.00 | 30.14 |
White (United States of America) | Female | 11.00 | 35.90 | |
Male | 5.00 | 26.16 | ||
35-44 | Black or African American (United States of America) | Male | 6.00 | 29.09 |
45-54 | Black or African American (United States of America) | Female | 9.00 | 30.14 |
Male | 12.00 | 25.09 | ||
55-64 | Black or African American (United States of America) | Female | 14.00 | 26.81 |
Male | 8.00 | 27.81 | ||
White (United States of America) | Male | 7.00 | 28.65 |
current_median_commercial_age_10_race_group_gender_salaried = commercial_salaried.groupby(['age_group_10','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_race_group_gender_salaried)
count_nonzero | median | |||
---|---|---|---|---|
age_group_10 | race_grouping | gender | ||
25-34 | person of color | Female | 13.00 | 78500.00 |
white | Female | 29.00 | 86680.09 | |
Male | 7.00 | 94870.00 | ||
35-44 | person of color | Male | 6.00 | 87845.99 |
white | Female | 17.00 | 153000.00 | |
45-54 | person of color | Female | 6.00 | 80879.99 |
white | Female | 7.00 | 148624.00 | |
55-64 | white | Female | 7.00 | 100592.59 |
Male | 7.00 | 95050.62 |
current_median_commercial_age_10_race_group_gender_hourly = commercial_hourly.groupby(['age_group_10','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress(current_median_commercial_age_10_race_group_gender_hourly)
count_nonzero | median | |||
---|---|---|---|---|
age_group_10 | race_grouping | gender | ||
25-34 | person of color | Female | 16.00 | 30.66 |
Male | 8.00 | 28.00 | ||
white | Female | 11.00 | 35.90 | |
Male | 5.00 | 26.16 | ||
35-44 | person of color | Female | 6.00 | 33.25 |
Male | 8.00 | 27.76 | ||
45-54 | person of color | Female | 10.00 | 29.38 |
Male | 12.00 | 25.09 | ||
55-64 | person of color | Female | 14.00 | 26.81 |
Male | 8.00 | 27.81 | ||
white | Male | 7.00 | 28.65 | |
65+ | person of color | Male | 5.00 | 26.75 |
current_commercial_median_department_salaried = commercial_salaried.groupby(['department']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_salaried)
count_nonzero | median | |
---|---|---|
department | ||
Finance | 9 | 95000.00 |
Client Solutions | 91 | 90241.58 |
Audience Development and Insights | 18 | 90000.00 |
Production | 5 | 75000.51 |
current_commercial_median_department_hourly = commercial_hourly.groupby(['department']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_hourly)
count_nonzero | median | |
---|---|---|
department | ||
Marketing | 7 | 39.64 |
Public Relations | 7 | 38.40 |
Washington Post Live | 7 | 33.33 |
Client Solutions | 48 | 31.23 |
Finance | 23 | 30.26 |
Production | 33 | 25.41 |
Customer Care and Logistics | 12 | 21.67 |
current_commercial_median_department_gender_salaried = commercial_salaried.groupby(['department','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_gender_salaried)
count_nonzero | median | ||
---|---|---|---|
department | gender | ||
Finance | Female | 6 | 95585.00 |
Audience Development and Insights | Female | 8 | 94391.70 |
Client Solutions | Female | 66 | 90471.94 |
Male | 25 | 89957.05 | |
Audience Development and Insights | Male | 10 | 87500.00 |
current_commercial_median_department_gender_hourly = commercial_hourly.groupby(['department','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_gender_hourly)
count_nonzero | median | ||
---|---|---|---|
department | gender | ||
Marketing | Female | 7 | 39.64 |
Public Relations | Female | 7 | 38.40 |
Washington Post Live | Female | 5 | 33.33 |
Client Solutions | Male | 18 | 32.24 |
Finance | Female | 16 | 30.60 |
Male | 7 | 30.26 | |
Client Solutions | Female | 30 | 30.16 |
Production | Male | 32 | 25.41 |
Customer Care and Logistics | Female | 9 | 21.63 |
current_commercial_median_department_race_salaried = commercial_salaried.groupby(['department','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_race_salaried)
count_nonzero | median | ||
---|---|---|---|
department | race_ethnicity | ||
Client Solutions | White (United States of America) | 68 | 99406.48 |
Audience Development and Insights | White (United States of America) | 14 | 93935.00 |
Client Solutions | Black or African American (United States of America) | 9 | 78759.98 |
Asian (United States of America) | 10 | 77677.21 |
current_commercial_median_department_race_hourly = commercial_hourly.groupby(['department','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_race_hourly)
count_nonzero | median | ||
---|---|---|---|
department | race_ethnicity | ||
Marketing | White (United States of America) | 5 | 39.64 |
Client Solutions | White (United States of America) | 16 | 33.28 |
Asian (United States of America) | 5 | 31.17 | |
Finance | Black or African American (United States of America) | 17 | 30.26 |
Client Solutions | Black or African American (United States of America) | 23 | 29.18 |
Production | White (United States of America) | 7 | 25.23 |
Black or African American (United States of America) | 22 | 25.20 | |
Customer Care and Logistics | Black or African American (United States of America) | 8 | 21.24 |
current_commercial_median_department_race_gender_salaried = commercial_salaried.groupby(['department','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_race_gender_salaried)
count_nonzero | median | |||
---|---|---|---|---|
department | race_ethnicity | gender | ||
Client Solutions | White (United States of America) | Female | 51 | 99465.35 |
Male | 17 | 99347.61 | ||
Audience Development and Insights | White (United States of America) | Female | 7 | 98783.40 |
Male | 7 | 93000.00 | ||
Client Solutions | Black or African American (United States of America) | Female | 6 | 78629.99 |
Asian (United States of America) | Female | 6 | 77625.00 |
current_commercial_median_department_race_gender_hourly = commercial_hourly.groupby(['department','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_race_gender_hourly)
count_nonzero | median | |||
---|---|---|---|---|
department | race_ethnicity | gender | ||
Marketing | White (United States of America) | Female | 5 | 39.64 |
Client Solutions | White (United States of America) | Female | 8 | 34.03 |
Male | 8 | 32.24 | ||
Black or African American (United States of America) | Male | 8 | 31.41 | |
Finance | Black or African American (United States of America) | Female | 12 | 30.60 |
Male | 5 | 30.26 | ||
Client Solutions | Black or African American (United States of America) | Female | 15 | 28.61 |
Production | Black or African American (United States of America) | Male | 21 | 25.41 |
White (United States of America) | Male | 7 | 25.23 | |
Customer Care and Logistics | Black or African American (United States of America) | Female | 8 | 21.24 |
current_commercial_median_department_race_group_gender_salaried = commercial_salaried.groupby(['department','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_race_group_gender_salaried)
count_nonzero | median | |||
---|---|---|---|---|
department | race_grouping | gender | ||
Client Solutions | white | Female | 51 | 99465.35 |
Male | 17 | 99347.61 | ||
Audience Development and Insights | white | Female | 7 | 98783.40 |
Male | 7 | 93000.00 | ||
Client Solutions | person of color | Male | 8 | 82084.64 |
Female | 15 | 77655.00 |
current_commercial_median_department_race_group_gender_hourly = commercial_hourly.groupby(['department','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_race_group_gender_hourly)
count_nonzero | median | |||
---|---|---|---|---|
department | race_grouping | gender | ||
Marketing | white | Female | 5 | 39.64 |
Client Solutions | white | Female | 8 | 34.03 |
Male | 8 | 32.24 | ||
person of color | Male | 10 | 31.41 | |
Finance | person of color | Female | 13 | 30.14 |
Male | 6 | 29.75 | ||
Client Solutions | person of color | Female | 22 | 28.61 |
Production | person of color | Male | 24 | 25.41 |
white | Male | 7 | 25.23 | |
Customer Care and Logistics | person of color | Female | 8 | 21.24 |
current_commercial_median_department_race_gender_age5_salaried = commercial_salaried.groupby(['department','race_ethnicity','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_race_gender_age5_salaried)
count_nonzero | median | ||||
---|---|---|---|---|---|
department | race_ethnicity | gender | age_group_5 | ||
Client Solutions | White (United States of America) | Female | 35-39 | 12.00 | 158686.70 |
45-49 | 5.00 | 155289.50 | |||
55-59 | 5.00 | 107453.00 | |||
30-34 | 8.00 | 93029.10 | |||
25-29 | 16.00 | 84284.93 | |||
Asian (United States of America) | Female | 25-29 | 5.00 | 77595.00 |
current_commercial_median_department_race_gender_age5_hourly = commercial_hourly.groupby(['department','race_ethnicity','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_race_gender_age5_hourly)
count_nonzero | median | ||||
---|---|---|---|---|---|
department | race_ethnicity | gender | age_group_5 | ||
Marketing | White (United States of America) | Female | 25-29 | 5.00 | 39.64 |
Finance | Black or African American (United States of America) | Female | 45-49 | 5.00 | 31.06 |
Production | Black or African American (United States of America) | Male | 45-49 | 6.00 | 24.67 |
current_commercial_median_department_race_group_gender_age5_salaried = commercial_salaried.groupby(['department','race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_race_group_gender_age5_salaried)
count_nonzero | median | ||||
---|---|---|---|---|---|
department | race_grouping | gender | age_group_5 | ||
Client Solutions | white | Female | 35-39 | 12.00 | 158686.70 |
45-49 | 5.00 | 155289.50 | |||
55-59 | 5.00 | 107453.00 | |||
30-34 | 8.00 | 93029.10 | |||
25-29 | 16.00 | 84284.93 | |||
person of color | Female | 25-29 | 9.00 | 77595.00 |
current_commercial_median_department_race_group_gender_age5_hourly = commercial_hourly.groupby(['department','race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_department_race_group_gender_age5_hourly)
count_nonzero | median | ||||
---|---|---|---|---|---|
department | race_grouping | gender | age_group_5 | ||
Marketing | white | Female | 25-29 | 5.00 | 39.64 |
Finance | person of color | Female | 45-49 | 5.00 | 31.06 |
Production | person of color | Male | 65+ | 5.00 | 26.75 |
45-49 | 6.00 | 24.67 |
current_commercial_median_job_salaried = commercial_salaried.groupby(['job_profile_current']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_salaried)
count_nonzero | median | |
---|---|---|
job_profile_current | ||
450220 - Sales Representative | 24 | 164497.80 |
551104 - Senior Financial Accountant | 5 | 96170.00 |
450120 - Account Manager | 19 | 93503.40 |
280428 - Designer - Content | 6 | 87000.30 |
481205 - Data Analyst | 8 | 84780.00 |
340227 - Artist | 5 | 77699.41 |
660127 - Make-Up Person | 5 | 75000.51 |
231303 - Client Service Manager | 14 | 73114.02 |
current_commercial_median_job_hourly = commercial_hourly.groupby(['job_profile_current']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_hourly)
count_nonzero | median | |
---|---|---|
job_profile_current | ||
240101 - Digital Marketing Specialist | 5 | 41.03 |
341027 - Desktop Publisher | 6 | 32.24 |
574504 - Senior Accounting Specialist | 16 | 31.68 |
565005 - Accounting Specialist | 7 | 28.74 |
470121 - Account Executive | 11 | 26.05 |
600318 - Circulation Driver (Class A) | 29 | 25.23 |
current_commercial_median_job_gender_salaried = commercial_salaried.groupby(['job_profile_current','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_gender_salaried)
count_nonzero | median | ||
---|---|---|---|
job_profile_current | gender | ||
450220 - Sales Representative | Female | 20 | 162794.50 |
450120 - Account Manager | Male | 7 | 95691.99 |
Female | 12 | 91609.44 | |
481205 - Data Analyst | Male | 6 | 82780.00 |
231303 - Client Service Manager | Female | 12 | 73114.02 |
current_commercial_median_job_gender_hourly = commercial_hourly.groupby(['job_profile_current','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_gender_hourly)
count_nonzero | median | ||
---|---|---|---|
job_profile_current | gender | ||
574504 - Senior Accounting Specialist | Female | 12 | 31.82 |
470121 - Account Executive | Female | 10 | 25.77 |
600318 - Circulation Driver (Class A) | Male | 28 | 25.32 |
current_commercial_median_job_race_salaried = commercial_salaried.groupby(['job_profile_current','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_race_salaried)
count_nonzero | median | ||
---|---|---|---|
job_profile_current | race_ethnicity | ||
450220 - Sales Representative | White (United States of America) | 23 | 163995.60 |
450120 - Account Manager | White (United States of America) | 11 | 107453.00 |
481205 - Data Analyst | White (United States of America) | 5 | 84780.00 |
450120 - Account Manager | Black or African American (United States of America) | 5 | 84169.27 |
231303 - Client Service Manager | White (United States of America) | 10 | 71229.54 |
current_commercial_median_job_race_hourly = commercial_hourly.groupby(['job_profile_current','race_ethnicity']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_race_hourly)
count_nonzero | median | ||
---|---|---|---|
job_profile_current | race_ethnicity | ||
574504 - Senior Accounting Specialist | Black or African American (United States of America) | 11 | 32.08 |
565005 - Accounting Specialist | Black or African American (United States of America) | 5 | 29.01 |
600318 - Circulation Driver (Class A) | White (United States of America) | 6 | 25.77 |
470121 - Account Executive | Black or African American (United States of America) | 7 | 25.50 |
600318 - Circulation Driver (Class A) | Black or African American (United States of America) | 19 | 24.78 |
current_commercial_median_job_race_gender_salaried = commercial_salaried.groupby(['job_profile_current','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_race_gender_salaried)
count_nonzero | median | |||
---|---|---|---|---|
job_profile_current | race_ethnicity | gender | ||
450220 - Sales Representative | White (United States of America) | Female | 20 | 162794.50 |
450120 - Account Manager | White (United States of America) | Female | 7 | 100592.59 |
231303 - Client Service Manager | White (United States of America) | Female | 8 | 71229.54 |
current_commercial_median_job_race_gender_hourly = commercial_hourly.groupby(['job_profile_current','race_ethnicity','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_race_gender_hourly)
count_nonzero | median | |||
---|---|---|---|---|
job_profile_current | race_ethnicity | gender | ||
574504 - Senior Accounting Specialist | Black or African American (United States of America) | Female | 9 | 32.08 |
600318 - Circulation Driver (Class A) | White (United States of America) | Male | 6 | 25.77 |
470121 - Account Executive | Black or African American (United States of America) | Female | 7 | 25.50 |
600318 - Circulation Driver (Class A) | Black or African American (United States of America) | Male | 18 | 24.67 |
current_commercial_median_job_race_group_gender_salaried = commercial_salaried.groupby(['desk','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_race_group_gender_salaried)
count_nonzero | median | |||
---|---|---|---|---|
desk | race_grouping | gender | ||
non-newsroom | white | Female | 62 | 98047.80 |
Male | 26 | 94693.21 | ||
person of color | Female | 22 | 80879.99 | |
Male | 15 | 80000.56 |
current_commercial_median_job_race_group_gender_hourly = commercial_hourly.groupby(['job_profile_current','race_grouping','gender']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_race_group_gender_hourly)
count_nonzero | median | |||
---|---|---|---|---|
job_profile_current | race_grouping | gender | ||
574504 - Senior Accounting Specialist | person of color | Female | 10 | 31.57 |
600318 - Circulation Driver (Class A) | white | Male | 6 | 25.77 |
470121 - Account Executive | person of color | Female | 8 | 25.50 |
600318 - Circulation Driver (Class A) | person of color | Male | 21 | 25.41 |
current_commercial_median_job_race_gender_age5_salaried = commercial_salaried.groupby(['job_profile_current','race_ethnicity','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_race_gender_age5_salaried)
count_nonzero | median | ||||
---|---|---|---|---|---|
job_profile_current | race_ethnicity | gender | age_group_5 | ||
450220 - Sales Representative | White (United States of America) | Female | 35-39 | 9.00 | 163995.60 |
231303 - Client Service Manager | White (United States of America) | Female | 25-29 | 6.00 | 70448.15 |
current_commercial_median_job_race_gender_age5_hourly = commercial_hourly.groupby(['job_profile_current','race_ethnicity','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_race_gender_age5_hourly)
count_nonzero | median | ||||
---|---|---|---|---|---|
job_profile_current | race_ethnicity | gender | age_group_5 | ||
574504 - Senior Accounting Specialist | Black or African American (United States of America) | Female | 45-49 | 5.00 | 31.06 |
600318 - Circulation Driver (Class A) | Black or African American (United States of America) | Male | 45-49 | 5.00 | 24.56 |
current_commercial_median_job_race_group_gender_age5_salaried = commercial_salaried.groupby(['job_profile_current','race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_race_group_gender_age5_salaried)
count_nonzero | median | ||||
---|---|---|---|---|---|
job_profile_current | race_grouping | gender | age_group_5 | ||
450220 - Sales Representative | white | Female | 35-39 | 9.00 | 163995.60 |
231303 - Client Service Manager | white | Female | 25-29 | 6.00 | 70448.15 |
current_commercial_median_job_race_group_gender_age5_hourly = commercial_hourly.groupby(['job_profile_current','race_grouping','gender','age_group_5']).agg({'current_base_pay': [np.count_nonzero, np.median]})
suppress_median(current_commercial_median_job_race_group_gender_age5_hourly)
count_nonzero | median | ||||
---|---|---|---|---|---|
job_profile_current | race_grouping | gender | age_group_5 | ||
574504 - Senior Accounting Specialist | person of color | Female | 45-49 | 5.00 | 31.06 |
600318 - Circulation Driver (Class A) | person of color | Male | 45-49 | 5.00 | 24.56 |
commercial_ratings = ratings_combined[ratings_combined['dept'] == "Commercial"]
commercial_ratings_gender = commercial_ratings.groupby(['gender']).agg({'performance_rating': [np.count_nonzero, np.median]})
commercial_ratings_gender
performance_rating | ||
---|---|---|
count_nonzero | median | |
gender | ||
Female | 3952 | 3.30 |
Male | 2769 | 3.20 |
commercial_ratings_race = commercial_ratings.groupby(['race_ethnicity']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress_median(commercial_ratings_race)
count_nonzero | median | |
---|---|---|
race_ethnicity | ||
Native Hawaiian or Other Pacific Islander (United States of America) | 13 | 3.70 |
American Indian or Alaska Native (United States of America) | 13 | 3.40 |
Two or More Races (United States of America) | 195 | 3.40 |
Prefer Not to Disclose (United States of America) | 91 | 3.30 |
White (United States of America) | 3250 | 3.30 |
Asian (United States of America) | 546 | 3.20 |
Black or African American (United States of America) | 2327 | 3.20 |
Hispanic or Latino (United States of America) | 286 | 3.20 |
commercial_ratings_race_gender = commercial_ratings.groupby(['race_ethnicity','gender']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress(commercial_ratings_race_gender)
count_nonzero | median | ||
---|---|---|---|
race_ethnicity | gender | ||
American Indian or Alaska Native (United States of America) | Female | 13 | 3.40 |
Asian (United States of America) | Female | 377 | 3.30 |
Male | 169 | 3.15 | |
Black or African American (United States of America) | Female | 1144 | 3.30 |
Male | 1183 | 3.10 | |
Hispanic or Latino (United States of America) | Female | 156 | 3.20 |
Male | 130 | 3.20 | |
Native Hawaiian or Other Pacific Islander (United States of America) | Female | 13 | 3.70 |
Prefer Not to Disclose (United States of America) | Female | 52 | 3.30 |
Male | 39 | 3.00 | |
Two or More Races (United States of America) | Female | 117 | 3.50 |
Male | 78 | 3.30 | |
White (United States of America) | Female | 2080 | 3.30 |
Male | 1170 | 3.20 |
commercial_change = reason_for_change_combined[reason_for_change_combined['dept'] == 'Commercial']
commercial_change_gender = commercial_change.groupby(['business_process_reason','gender']).agg({'business_process_reason': [np.count_nonzero]})
suppress_count(commercial_change_gender)
count_nonzero | ||
---|---|---|
business_process_reason | gender | |
Request Compensation Change > Adjustment > Contract Increase | Female | 720 |
Male | 530 | |
Merit > Performance > Annual Performance Appraisal | Female | 437 |
Request Compensation Change > Adjustment > Change Plan Assignment | Female | 369 |
Merit > Performance > Annual Performance Appraisal | Male | 293 |
Request Compensation Change > Adjustment > Change Plan Assignment | Male | 187 |
Promotion > Promotion > Promotion | Female | 173 |
Transfer > Transfer > Move to another manager | Female | 121 |
Request Compensation Change > Adjustment > Market Adjustment | Female | 97 |
Hire Employee > New Hire > Fill Vacancy | Female | 93 |
Transfer > Transfer > Move to another manager | Male | 90 |
Data Change > Data Change > Change Job Details | Female | 87 |
Promotion > Promotion > Promotion | Male | 67 |
Request Compensation Change > Adjustment > Market Adjustment | Male | 59 |
Data Change > Data Change > Change Job Details | Male | 59 |
Hire Employee > New Hire > Fill Vacancy | Male | 58 |
Hire Employee > New Hire > New Position | Female | 43 |
Male | 30 | |
Transfer > Transfer > Transfer between companies | Female | 18 |
Lateral Move > Lateral Move > Move to Another Position | Male | 15 |
Female | 14 | |
Request Compensation Change > Adjustment > Increased Job Responsibilities | Male | 13 |
Female | 12 | |
Request Compensation Change > Adjustment > Job Change | Female | 9 |
Hire Employee > New Hire > Convert Contingent | Female | 9 |
Hire Employee > Rehire > Fill Vacancy | Male | 7 |
Request Compensation Change > Adjustment > Performance | Female | 7 |
Data Change > Data Change > Change Job Profile | Female | 7 |
Hire Employee > Rehire > Fill Vacancy | Female | 6 |
Hire Employee > New Hire > Convert Contingent | Male | 6 |
Request Compensation Change > Adjustment > Performance | Male | 6 |
Request Compensation Change > Adjustment > Job Change | Male | 5 |
commercial_change_race = commercial_change[commercial_change['business_process_reason'] == 'Merit > Performance > Annual Performance Appraisal'].groupby(['business_process_reason','race_ethnicity']).agg({'business_process_reason': [np.count_nonzero]})
suppress_count(commercial_change_race)
count_nonzero | ||
---|---|---|
business_process_reason | race_ethnicity | |
Merit > Performance > Annual Performance Appraisal | White (United States of America) | 318 |
Black or African American (United States of America) | 316 | |
Asian (United States of America) | 46 | |
Hispanic or Latino (United States of America) | 26 | |
Two or More Races (United States of America) | 14 |
commercial_change_race_gender = commercial_change[commercial_change['business_process_reason'] == 'Merit > Performance > Annual Performance Appraisal'].groupby(['business_process_reason','race_ethnicity','gender']).agg({'business_process_reason': [np.count_nonzero]})
suppress_count(commercial_change_race_gender)
count_nonzero | |||
---|---|---|---|
business_process_reason | race_ethnicity | gender | |
Merit > Performance > Annual Performance Appraisal | White (United States of America) | Female | 198 |
Black or African American (United States of America) | Female | 173 | |
Male | 143 | ||
White (United States of America) | Male | 120 | |
Asian (United States of America) | Female | 28 | |
Male | 18 | ||
Hispanic or Latino (United States of America) | Female | 18 | |
Two or More Races (United States of America) | Female | 11 | |
Hispanic or Latino (United States of America) | Male | 8 |
import re
reason_for_change_combined['merit_raises'] = reason_for_change_combined['business_process_reason'].str.contains('Merit', re.IGNORECASE)
twenty14 = np.datetime64('2016-04-01')
twenty15 = np.datetime64('2017-04-01')
twenty16 = np.datetime64('2018-04-01')
twenty17 = np.datetime64('2019-04-01')
twenty18 = np.datetime64('2020-04-01')
def raise_time(row):
if row['effective_date'] < twenty14:
return 'before 2015'
if row['effective_date'] < twenty15:
return '2015'
if row['effective_date'] < twenty16:
return '2016'
if row['effective_date'] < twenty17:
return '2017'
if row['effective_date'] < twenty18:
return '2018'
return 'unknown'
reason_for_change_combined['raise_after'] = reason_for_change_combined.apply(lambda row: raise_time(row), axis=1)
merit_raises_commercial_gender_salaried = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried')].groupby(['gender']).agg({'base_pay_change': [np.count_nonzero, np.median]})
merit_raises_commercial_gender_salaried
base_pay_change | ||
---|---|---|
count_nonzero | median | |
gender | ||
Female | 170 | 1612.05 |
Male | 82 | 1281.37 |
merit_raises_commercial_gender_hourly = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Hourly')].groupby(['gender']).agg({'base_pay_change': [np.count_nonzero, np.median]})
merit_raises_commercial_gender_hourly
base_pay_change | ||
---|---|---|
count_nonzero | median | |
gender | ||
Female | 213 | 0.45 |
Male | 192 | 0.35 |
merit_raises_commercial_race_salaried = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried')].groupby(['race_ethnicity']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_commercial_race_salaried)
count_nonzero | median | |
---|---|---|
race_ethnicity | ||
White (United States of America) | 166 | 1611.80 |
Asian (United States of America) | 30 | 1403.82 |
Black or African American (United States of America) | 42 | 1178.33 |
Two or More Races (United States of America) | 7 | 1000.00 |
merit_raises_commercial_race_hourly = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Hourly')].groupby(['race_ethnicity']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_commercial_race_hourly)
count_nonzero | median | |
---|---|---|
race_ethnicity | ||
Two or More Races (United States of America) | 5 | 0.58 |
White (United States of America) | 121 | 0.46 |
Hispanic or Latino (United States of America) | 16 | 0.43 |
Asian (United States of America) | 13 | 0.42 |
Black or African American (United States of America) | 246 | 0.35 |
merit_raises_commercial_race_group_salaried = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried')].groupby(['race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_commercial_race_group_salaried)
count_nonzero | median | |
---|---|---|
race_grouping | ||
white | 166 | 1611.80 |
person of color | 83 | 1225.00 |
merit_raises_commercial_race_group_hourly = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Hourly')].groupby(['race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_commercial_race_group_hourly)
count_nonzero | median | |
---|---|---|
race_grouping | ||
white | 121 | 0.46 |
person of color | 283 | 0.37 |
merit_raises_commercial_gender_race_group_salaried = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_commercial_gender_race_group_salaried)
count_nonzero | median | ||
---|---|---|---|
gender | race_grouping | ||
Female | white | 119 | 1719.50 |
Male | white | 47 | 1303.24 |
Female | person of color | 48 | 1273.68 |
Male | person of color | 35 | 1134.24 |
merit_raises_commercial_gender_race_group_hourly = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Hourly')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress_median(merit_raises_commercial_gender_race_group_hourly)
count_nonzero | median | ||
---|---|---|---|
gender | race_grouping | ||
Female | white | 55 | 0.56 |
Male | white | 66 | 0.41 |
Female | person of color | 158 | 0.41 |
Male | person of color | 125 | 0.33 |
fifteen_raises = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2015')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]},{'2015_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(fifteen_raises)
count_nonzero | median | ||
---|---|---|---|
gender | race_grouping | ||
Female | white | 6 | 814.24 |
fifteen_raises = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2015')].groupby(['gender','race_grouping']).agg({'2015_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(fifteen_raises)
count_nonzero | median | ||
---|---|---|---|
gender | race_grouping | ||
Female | white | 6 | 3.45 |
sixteen_raises = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2016')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]},{'2016_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(sixteen_raises)
count_nonzero | median | ||
---|---|---|---|
gender | race_grouping | ||
Female | person of color | 5 | 1729.40 |
white | 6 | 1527.03 | |
Male | person of color | 5 | 1442.77 |
white | 5 | 1355.89 |
sixteen_raises = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2016')].groupby(['gender','race_grouping']).agg({'2016_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(sixteen_raises)
count_nonzero | median | ||
---|---|---|---|
gender | race_grouping | ||
Female | person of color | 5 | 3.50 |
white | 6 | 3.80 | |
Male | person of color | 5 | 3.30 |
white | 5 | 3.20 |
seventeen_raises = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2017')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]},{'2017_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(seventeen_raises)
count_nonzero | median | ||
---|---|---|---|
gender | race_grouping | ||
Female | white | 10 | 1425.74 |
Male | person of color | 6 | 950.10 |
seventeen_raises = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2017')].groupby(['gender','race_grouping']).agg({'2017_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(seventeen_raises)
count_nonzero | median | ||
---|---|---|---|
gender | race_grouping | ||
Female | white | 10 | 3.30 |
Male | person of color | 6 | 3.25 |
eighteen_raises = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2018')].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]},{'2018_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(eighteen_raises)
count_nonzero | median | ||
---|---|---|---|
gender | race_grouping | ||
Female | person of color | 8 | 1307.80 |
white | 17 | 1844.28 | |
Male | person of color | 5 | 1050.00 |
eighteen_raises = reason_for_change_combined[(reason_for_change_combined['merit_raises'] == True) & (reason_for_change_combined['dept'] == 'Commercial') & (reason_for_change_combined['pay_rate_type'] == 'Salaried') & (reason_for_change_combined['raise_after'] == '2018')].groupby(['gender','race_grouping']).agg({'2018_annual_performance_rating': [np.count_nonzero, np.median]})
suppress(eighteen_raises)
count_nonzero | median | ||
---|---|---|---|
gender | race_grouping | ||
Female | person of color | 8 | 3.40 |
white | 17 | 3.50 | |
Male | person of color | 5 | 3.40 |
merit_raises_15 = reason_for_change_combined[(reason_for_change_combined['raise_after'] == '2015') & (reason_for_change_combined['merit_raises'] == True)]
merit_raises_16 = reason_for_change_combined[(reason_for_change_combined['raise_after'] == '2016') & (reason_for_change_combined['merit_raises'] == True)]
merit_raises_17 = reason_for_change_combined[(reason_for_change_combined['raise_after'] == '2017') & (reason_for_change_combined['merit_raises'] == True)]
merit_raises_18 = reason_for_change_combined[(reason_for_change_combined['raise_after'] == '2018') & (reason_for_change_combined['merit_raises'] == True)]
merit_raises_15 = merit_raises_15[['base_pay_change','pay_rate_type','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2015_annual_performance_rating']].rename(columns={'2015_annual_performance_rating':'performance_rating'})
merit_raises_16 = merit_raises_16[['base_pay_change','pay_rate_type','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2016_annual_performance_rating']].rename(columns={'2016_annual_performance_rating':'performance_rating'})
merit_raises_17 = merit_raises_17[['base_pay_change','pay_rate_type','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2017_annual_performance_rating']].rename(columns={'2017_annual_performance_rating':'performance_rating'})
merit_raises_18 = merit_raises_18[['base_pay_change','pay_rate_type','gender','race_ethnicity','race_grouping','age_group_5','dept','tier','2018_annual_performance_rating']].rename(columns={'2018_annual_performance_rating':'performance_rating'})
merit_raises_15 = pd.DataFrame(merit_raises_15)
merit_raises_16 = pd.DataFrame(merit_raises_16)
merit_raises_17 = pd.DataFrame(merit_raises_17)
merit_raises_18 = pd.DataFrame(merit_raises_18)
merit_raises_combined = pd.concat([merit_raises_15,merit_raises_16,merit_raises_17,merit_raises_18])
commercial_salaried_raises = merit_raises_combined[merit_raises_combined['pay_rate_type'] == 'Salaried'].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress(commercial_salaried_raises)
count_nonzero | median | ||
---|---|---|---|
gender | race_grouping | ||
Female | person of color | 108 | 2500.00 |
unknown | 9 | 2720.00 | |
white | 289 | 2500.00 | |
Male | person of color | 95 | 2059.93 |
unknown | 6 | 3250.00 | |
white | 348 | 3000.00 |
commercial_salaried_raises_scores = merit_raises_combined[merit_raises_combined['pay_rate_type'] == 'Salaried'].groupby(['gender','race_grouping']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress(commercial_salaried_raises_scores)
count_nonzero | median | ||
---|---|---|---|
gender | race_grouping | ||
Female | person of color | 108 | 3.40 |
unknown | 9 | 3.90 | |
white | 289 | 3.50 | |
Male | person of color | 95 | 3.40 |
unknown | 6 | 3.75 | |
white | 348 | 3.60 |
commercial_hourly_raises = merit_raises_combined[merit_raises_combined['pay_rate_type'] == 'Hourly'].groupby(['gender','race_grouping']).agg({'base_pay_change': [np.count_nonzero, np.median]})
suppress(commercial_hourly_raises)
count_nonzero | median | ||
---|---|---|---|
gender | race_grouping | ||
Female | person of color | 102 | 0.38 |
white | 80 | 0.94 | |
Male | person of color | 100 | 0.34 |
white | 63 | 0.45 |
commercial_hourly_raises_scores = merit_raises_combined[merit_raises_combined['pay_rate_type'] == 'Hourly'].groupby(['gender','race_grouping']).agg({'performance_rating': [np.count_nonzero, np.median]})
suppress(commercial_hourly_raises_scores)
count_nonzero | median | ||
---|---|---|---|
gender | race_grouping | ||
Female | person of color | 102 | 3.30 |
white | 80 | 3.50 | |
Male | person of color | 100 | 3.20 |
white | 63 | 3.30 |
commercial_salaried_regression = commercial_salaried[['department','gender','race_ethnicity','current_base_pay','job_profile_current','cost_center_current','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','age','years_of_service','age_group_5','years_of_service_grouped','dept','desk','tier','race_grouping']]
commercial_salaried_regression = pd.get_dummies(commercial_salaried_regression, columns=['gender','race_ethnicity','age_group_5','years_of_service_grouped','dept','desk','tier','race_grouping'])
commercial_salaried_regression = commercial_salaried_regression.rename(columns={'race_grouping_person of color':'race_grouping_person_of_color','age_group_5_<25':'age_group_5_25_under','age_group_5_25-29':'age_group_5_25to29','age_group_5_30-34':'age_group_5_30to34','age_group_5_35-39':'age_group_5_35to39','age_group_5_40-44':'age_group_5_40to44','age_group_5_45-49':'age_group_5_45to49','age_group_5_50-54':'age_group_5_50to54','age_group_5_55-59':'age_group_5_55to59','age_group_5_60-64':'age_group_5_60to64','age_group_5_65+':'age_group_5_65_over','tier_Tier 1':'tier_Tier_1','tier_Tier 2':'tier_Tier_2','tier_Tier 3':'tier_Tier_3','tier_Tier 4':'tier_Tier_4','years_of_service_grouped_0':'years_of_service_grouped_0','years_of_service_grouped_1-2':'years_of_service_grouped_1to2','years_of_service_grouped_3-5':'years_of_service_grouped_3to5','years_of_service_grouped_6-10':'years_of_service_grouped_6to10','years_of_service_grouped_11-15':'years_of_service_grouped_11to15','years_of_service_grouped_16-20':'years_of_service_grouped_16to20','years_of_service_grouped_21-25':'years_of_service_grouped_21to25','years_of_service_grouped_25+':'years_of_service_grouped_25_over'})
import statsmodels.formula.api as sm
model41 = sm.ols(data=commercial_salaried_regression, formula = 'current_base_pay ~ gender_Female + gender_Male')
result41 = model41.fit()
result41.summary()
Dep. Variable: | current_base_pay | R-squared: | 0.011 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | -0.005 |
Method: | Least Squares | F-statistic: | 0.6759 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.511 |
Time: | 20:32:01 | Log-Likelihood: | -1494.7 |
No. Observations: | 126 | AIC: | 2995. |
Df Residuals: | 123 | BIC: | 3004. |
Df Model: | 2 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 6.689e+04 | 2201.909 | 30.377 | 0.000 | 6.25e+04 | 7.12e+04 |
gender_Female | 3.728e+04 | 3095.378 | 12.045 | 0.000 | 3.12e+04 | 4.34e+04 |
gender_Male | 2.96e+04 | 3828.922 | 7.732 | 0.000 | 2.2e+04 | 3.72e+04 |
Omnibus: | 23.432 | Durbin-Watson: | 1.617 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 30.119 |
Skew: | 1.176 | Prob(JB): | 2.88e-07 |
Kurtosis: | 3.456 | Cond. No. | 1.21e+15 |
model42 = sm.ols(data=commercial_salaried_regression, formula = 'current_base_pay ~ race_grouping_white + race_grouping_person_of_color')
result42 = model42.fit()
result42.summary()
Dep. Variable: | current_base_pay | R-squared: | 0.099 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.085 |
Method: | Least Squares | F-statistic: | 6.794 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.00159 |
Time: | 20:32:01 | Log-Likelihood: | -1488.8 |
No. Observations: | 126 | AIC: | 2984. |
Df Residuals: | 123 | BIC: | 2992. |
Df Model: | 2 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 1.04e+05 | 3.31e+04 | 3.137 | 0.002 | 3.84e+04 | 1.7e+05 |
race_grouping_white | 4738.6158 | 3.33e+04 | 0.142 | 0.887 | -6.12e+04 | 7.07e+04 |
race_grouping_person_of_color | -1.92e+04 | 3.36e+04 | -0.571 | 0.569 | -8.57e+04 | 4.73e+04 |
Omnibus: | 20.566 | Durbin-Watson: | 1.586 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 25.036 |
Skew: | 1.063 | Prob(JB): | 3.66e-06 |
Kurtosis: | 3.497 | Cond. No. | 24.7 |
model43 = sm.ols(data=commercial_salaried_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result43 = model43.fit()
result43.summary()
Dep. Variable: | current_base_pay | R-squared: | 0.104 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.082 |
Method: | Least Squares | F-statistic: | 4.745 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.00364 |
Time: | 20:32:01 | Log-Likelihood: | -1488.4 |
No. Observations: | 126 | AIC: | 2985. |
Df Residuals: | 122 | BIC: | 2996. |
Df Model: | 3 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 6.758e+04 | 2.22e+04 | 3.040 | 0.003 | 2.36e+04 | 1.12e+05 |
gender_Female | 3.642e+04 | 1.13e+04 | 3.233 | 0.002 | 1.41e+04 | 5.87e+04 |
gender_Male | 3.116e+04 | 1.18e+04 | 2.630 | 0.010 | 7709.059 | 5.46e+04 |
race_grouping_white | 6291.0373 | 3.34e+04 | 0.188 | 0.851 | -5.99e+04 | 7.25e+04 |
race_grouping_person_of_color | -1.707e+04 | 3.37e+04 | -0.506 | 0.614 | -8.38e+04 | 4.97e+04 |
Omnibus: | 19.907 | Durbin-Watson: | 1.581 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 23.987 |
Skew: | 1.042 | Prob(JB): | 6.18e-06 |
Kurtosis: | 3.479 | Cond. No. | 1.44e+15 |
new_commercial_salaried_regression = pd.DataFrame({'gender_Female': [1,0,1,0], 'gender_Male': [0,1,0,1], 'race_grouping_white': [1,1,0,0], 'race_grouping_person_of_color': [0,0,1,1], 'age': [40,40,40,40]})
new_commercial_salaried_regression['predicted'] = result43.predict(new_commercial_salaried_regression)
new_commercial_salaried_regression
gender_Female | gender_Male | race_grouping_white | race_grouping_person_of_color | age | predicted | |
---|---|---|---|---|---|---|
0 | 1 | 0 | 1 | 0 | 40 | 110291.04 |
1 | 0 | 1 | 1 | 0 | 40 | 105036.69 |
2 | 1 | 0 | 0 | 1 | 40 | 86932.53 |
3 | 0 | 1 | 0 | 1 | 40 | 81678.18 |
model44 = sm.ols(data=commercial_salaried_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result44 = model44.fit()
result44.summary()
Dep. Variable: | current_base_pay | R-squared: | 0.243 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.177 |
Method: | Least Squares | F-statistic: | 3.691 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.000263 |
Time: | 20:32:01 | Log-Likelihood: | -1477.8 |
No. Observations: | 126 | AIC: | 2978. |
Df Residuals: | 115 | BIC: | 3009. |
Df Model: | 10 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 6.431e+04 | 2243.933 | 28.660 | 0.000 | 5.99e+04 | 6.88e+04 |
gender_Female | 3.499e+04 | 3475.108 | 10.068 | 0.000 | 2.81e+04 | 4.19e+04 |
gender_Male | 2.932e+04 | 3765.430 | 7.787 | 0.000 | 2.19e+04 | 3.68e+04 |
age_group_5_25_under | -1.534e+04 | 1.21e+04 | -1.263 | 0.209 | -3.94e+04 | 8720.201 |
age_group_5_25to29 | -1.306e+04 | 5801.246 | -2.252 | 0.026 | -2.46e+04 | -1573.642 |
age_group_5_30to34 | 8777.5553 | 8641.844 | 1.016 | 0.312 | -8340.274 | 2.59e+04 |
age_group_5_35to39 | 2.904e+04 | 7399.029 | 3.925 | 0.000 | 1.44e+04 | 4.37e+04 |
age_group_5_40to44 | 3.197e+04 | 1.2e+04 | 2.654 | 0.009 | 8105.201 | 5.58e+04 |
age_group_5_45to49 | 2.064e+04 | 9156.128 | 2.254 | 0.026 | 2503.649 | 3.88e+04 |
age_group_5_50to54 | -6459.9300 | 1.14e+04 | -0.568 | 0.571 | -2.9e+04 | 1.61e+04 |
age_group_5_55to59 | 1017.3191 | 8273.822 | 0.123 | 0.902 | -1.54e+04 | 1.74e+04 |
age_group_5_60to64 | -1076.6926 | 1.67e+04 | -0.064 | 0.949 | -3.42e+04 | 3.2e+04 |
age_group_5_65_over | 8804.5895 | 1.27e+04 | 0.696 | 0.488 | -1.63e+04 | 3.39e+04 |
Omnibus: | 8.578 | Durbin-Watson: | 1.762 |
---|---|---|---|
Prob(Omnibus): | 0.014 | Jarque-Bera (JB): | 8.412 |
Skew: | 0.612 | Prob(JB): | 0.0149 |
Kurtosis: | 3.321 | Cond. No. | 4.76e+15 |
model45 = sm.ols(data=commercial_salaried_regression, formula = 'current_base_pay ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result45 = model45.fit()
result45.summary()
Dep. Variable: | current_base_pay | R-squared: | 0.346 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.282 |
Method: | Least Squares | F-statistic: | 5.473 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 5.85e-07 |
Time: | 20:32:01 | Log-Likelihood: | -1468.7 |
No. Observations: | 126 | AIC: | 2961. |
Df Residuals: | 114 | BIC: | 2995. |
Df Model: | 11 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 9.203e+04 | 2.77e+04 | 3.327 | 0.001 | 3.72e+04 | 1.47e+05 |
race_grouping_white | 9848.9444 | 3.05e+04 | 0.323 | 0.747 | -5.06e+04 | 7.02e+04 |
race_grouping_person_of_color | -1.569e+04 | 3.08e+04 | -0.509 | 0.611 | -7.67e+04 | 4.53e+04 |
age_group_5_25_under | -1.744e+04 | 1.17e+04 | -1.495 | 0.138 | -4.05e+04 | 5665.917 |
age_group_5_25to29 | -8438.9348 | 6003.368 | -1.406 | 0.163 | -2.03e+04 | 3453.691 |
age_group_5_30to34 | 1.197e+04 | 7761.857 | 1.542 | 0.126 | -3406.884 | 2.73e+04 |
age_group_5_35to39 | 3.035e+04 | 7435.141 | 4.082 | 0.000 | 1.56e+04 | 4.51e+04 |
age_group_5_40to44 | 3.507e+04 | 1.16e+04 | 3.013 | 0.003 | 1.2e+04 | 5.81e+04 |
age_group_5_45to49 | 3.036e+04 | 8837.574 | 3.435 | 0.001 | 1.29e+04 | 4.79e+04 |
age_group_5_50to54 | -5790.0262 | 1.09e+04 | -0.532 | 0.596 | -2.74e+04 | 1.58e+04 |
age_group_5_55to59 | -342.4178 | 8312.708 | -0.041 | 0.967 | -1.68e+04 | 1.61e+04 |
age_group_5_60to64 | 2968.0059 | 1.59e+04 | 0.187 | 0.852 | -2.85e+04 | 3.45e+04 |
age_group_5_65_over | 1.333e+04 | 1.17e+04 | 1.139 | 0.257 | -9845.176 | 3.65e+04 |
Omnibus: | 6.245 | Durbin-Watson: | 1.832 |
---|---|---|---|
Prob(Omnibus): | 0.044 | Jarque-Bera (JB): | 5.876 |
Skew: | 0.517 | Prob(JB): | 0.0530 |
Kurtosis: | 3.220 | Cond. No. | 6.36e+15 |
model46 = sm.ols(data=commercial_salaried_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result46 = model46.fit()
result46.summary()
Dep. Variable: | current_base_pay | R-squared: | 0.346 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.277 |
Method: | Least Squares | F-statistic: | 4.987 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 1.37e-06 |
Time: | 20:32:01 | Log-Likelihood: | -1468.6 |
No. Observations: | 126 | AIC: | 2963. |
Df Residuals: | 113 | BIC: | 3000. |
Df Model: | 12 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 6.306e+04 | 1.91e+04 | 3.301 | 0.001 | 2.52e+04 | 1.01e+05 |
gender_Female | 3.259e+04 | 9988.518 | 3.263 | 0.001 | 1.28e+04 | 5.24e+04 |
gender_Male | 3.047e+04 | 1.02e+04 | 2.989 | 0.003 | 1.03e+04 | 5.07e+04 |
race_grouping_white | 9953.6974 | 3.06e+04 | 0.325 | 0.746 | -5.07e+04 | 7.06e+04 |
race_grouping_person_of_color | -1.533e+04 | 3.09e+04 | -0.496 | 0.621 | -7.66e+04 | 4.6e+04 |
age_group_5_25_under | -1.98e+04 | 1.16e+04 | -1.704 | 0.091 | -4.28e+04 | 3221.269 |
age_group_5_25to29 | -1.176e+04 | 5856.156 | -2.009 | 0.047 | -2.34e+04 | -162.446 |
age_group_5_30to34 | 8343.5878 | 8102.615 | 1.030 | 0.305 | -7709.155 | 2.44e+04 |
age_group_5_35to39 | 2.712e+04 | 7267.676 | 3.732 | 0.000 | 1.27e+04 | 4.15e+04 |
age_group_5_40to44 | 3.231e+04 | 1.15e+04 | 2.812 | 0.006 | 9548.083 | 5.51e+04 |
age_group_5_45to49 | 2.668e+04 | 8976.770 | 2.972 | 0.004 | 8893.475 | 4.45e+04 |
age_group_5_50to54 | -8079.1144 | 1.09e+04 | -0.743 | 0.459 | -2.96e+04 | 1.35e+04 |
age_group_5_55to59 | -3200.8446 | 8093.807 | -0.395 | 0.693 | -1.92e+04 | 1.28e+04 |
age_group_5_60to64 | -140.2857 | 1.58e+04 | -0.009 | 0.993 | -3.15e+04 | 3.12e+04 |
age_group_5_65_over | 1.159e+04 | 1.21e+04 | 0.961 | 0.339 | -1.23e+04 | 3.55e+04 |
Omnibus: | 6.419 | Durbin-Watson: | 1.826 |
---|---|---|---|
Prob(Omnibus): | 0.040 | Jarque-Bera (JB): | 6.039 |
Skew: | 0.522 | Prob(JB): | 0.0488 |
Kurtosis: | 3.245 | Cond. No. | 6.96e+15 |
merit_raises_combined_salaried_regression = merit_raises_combined[(merit_raises_combined['dept'] == 'Commercial') & (merit_raises_combined['pay_rate_type'] == 'Salaried')]
merit_raises_combined_salaried_regression = pd.get_dummies(merit_raises_combined_salaried_regression, columns=['gender','race_grouping','age_group_5'])
merit_raises_combined_salaried_regression = merit_raises_combined_salaried_regression.rename(columns={'race_grouping_person of color':'race_grouping_person_of_color','age_group_5_<25':'age_group_5_25_under','age_group_5_25-29':'age_group_5_25to29','age_group_5_30-34':'age_group_5_30to34','age_group_5_35-39':'age_group_5_35to39','age_group_5_40-44':'age_group_5_40to44','age_group_5_45-49':'age_group_5_45to49','age_group_5_50-54':'age_group_5_50to54','age_group_5_55-59':'age_group_5_55to59','age_group_5_60-64':'age_group_5_60to64','age_group_5_65+':'age_group_5_65_over'})
model47 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ gender_Female + gender_Male')
result47 = model47.fit()
result47.summary()
Dep. Variable: | base_pay_change | R-squared: | 0.065 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.055 |
Method: | Least Squares | F-statistic: | 6.387 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.0132 |
Time: | 20:32:01 | Log-Likelihood: | -782.80 |
No. Observations: | 94 | AIC: | 1570. |
Df Residuals: | 92 | BIC: | 1575. |
Df Model: | 1 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 982.3188 | 72.396 | 13.569 | 0.000 | 838.535 | 1126.103 |
gender_Female | 765.5935 | 104.539 | 7.324 | 0.000 | 557.971 | 973.216 |
gender_Male | 216.7253 | 123.602 | 1.753 | 0.083 | -28.758 | 462.209 |
Omnibus: | 53.058 | Durbin-Watson: | 2.115 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 163.539 |
Skew: | 2.005 | Prob(JB): | 3.08e-36 |
Kurtosis: | 8.067 | Cond. No. | 7.78e+15 |
model48 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ race_grouping_white + race_grouping_person_of_color')
result48 = model48.fit()
result48.summary()
Dep. Variable: | base_pay_change | R-squared: | 0.022 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.000 |
Method: | Least Squares | F-statistic: | 1.020 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.365 |
Time: | 20:32:01 | Log-Likelihood: | -784.91 |
No. Observations: | 94 | AIC: | 1576. |
Df Residuals: | 91 | BIC: | 1583. |
Df Model: | 2 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 1400.0000 | 1040.448 | 1.346 | 0.182 | -666.723 | 3466.723 |
race_grouping_white | 281.2502 | 1050.038 | 0.268 | 0.789 | -1804.521 | 2367.021 |
race_grouping_person_of_color | -29.3659 | 1053.703 | -0.028 | 0.978 | -2122.417 | 2063.686 |
Omnibus: | 57.261 | Durbin-Watson: | 2.031 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 190.464 |
Skew: | 2.155 | Prob(JB): | 4.38e-42 |
Kurtosis: | 8.482 | Cond. No. | 20.7 |
model49 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result49 = model49.fit()
result49.summary()
Dep. Variable: | base_pay_change | R-squared: | 0.075 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.044 |
Method: | Least Squares | F-statistic: | 2.434 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.0700 |
Time: | 20:32:01 | Log-Likelihood: | -782.29 |
No. Observations: | 94 | AIC: | 1573. |
Df Residuals: | 90 | BIC: | 1583. |
Df Model: | 3 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 763.3484 | 682.375 | 1.119 | 0.266 | -592.309 | 2119.005 |
gender_Female | 636.6516 | 347.276 | 1.833 | 0.070 | -53.274 | 1326.577 |
gender_Male | 126.6968 | 370.633 | 0.342 | 0.733 | -609.631 | 863.024 |
race_grouping_white | 422.9043 | 1028.666 | 0.411 | 0.682 | -1620.721 | 2466.529 |
race_grouping_person_of_color | 219.0736 | 1036.139 | 0.211 | 0.833 | -1839.398 | 2277.545 |
Omnibus: | 51.713 | Durbin-Watson: | 2.131 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 156.955 |
Skew: | 1.951 | Prob(JB): | 8.27e-35 |
Kurtosis: | 7.985 | Cond. No. | 9.86e+15 |
new_reason_for_change_combined_regression = pd.DataFrame({'gender_Female': [1,0,1,0], 'gender_Male': [0,1,0,1], 'race_grouping_white': [1,1,0,0], 'race_grouping_person_of_color': [0,0,1,1]})
new_reason_for_change_combined_regression['predicted'] = result49.predict(new_reason_for_change_combined_regression)
new_reason_for_change_combined_regression
gender_Female | gender_Male | race_grouping_white | race_grouping_person_of_color | predicted | |
---|---|---|---|---|---|
0 | 1 | 0 | 1 | 0 | 1822.90 |
1 | 0 | 1 | 1 | 0 | 1312.95 |
2 | 1 | 0 | 0 | 1 | 1619.07 |
3 | 0 | 1 | 0 | 1 | 1109.12 |
model50 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result50 = model50.fit()
result50.summary()
Dep. Variable: | base_pay_change | R-squared: | 0.127 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.033 |
Method: | Least Squares | F-statistic: | 1.352 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.223 |
Time: | 20:32:01 | Log-Likelihood: | -779.60 |
No. Observations: | 94 | AIC: | 1579. |
Df Residuals: | 84 | BIC: | 1605. |
Df Model: | 9 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 868.0815 | 101.906 | 8.518 | 0.000 | 665.431 | 1070.732 |
gender_Female | 717.9315 | 122.709 | 5.851 | 0.000 | 473.912 | 961.951 |
gender_Male | 150.1500 | 129.752 | 1.157 | 0.250 | -107.877 | 408.177 |
age_group_5_25_under | 5.5e-14 | 2.48e-13 | 0.222 | 0.825 | -4.37e-13 | 5.47e-13 |
age_group_5_25to29 | -161.2607 | 331.113 | -0.487 | 0.628 | -819.716 | 497.194 |
age_group_5_30to34 | 248.7116 | 255.316 | 0.974 | 0.333 | -259.012 | 756.435 |
age_group_5_35to39 | 98.1318 | 297.637 | 0.330 | 0.742 | -493.753 | 690.017 |
age_group_5_40to44 | 587.8483 | 295.562 | 1.989 | 0.050 | 0.091 | 1175.606 |
age_group_5_45to49 | 520.5110 | 339.894 | 1.531 | 0.129 | -155.406 | 1196.428 |
age_group_5_50to54 | -27.9809 | 295.562 | -0.095 | 0.925 | -615.739 | 559.777 |
age_group_5_55to59 | -20.9810 | 307.555 | -0.068 | 0.946 | -632.588 | 590.626 |
age_group_5_60to64 | -358.6672 | 662.336 | -0.542 | 0.590 | -1675.794 | 958.460 |
age_group_5_65_over | -18.2315 | 932.753 | -0.020 | 0.984 | -1873.114 | 1836.651 |
Omnibus: | 44.764 | Durbin-Watson: | 2.013 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 116.271 |
Skew: | 1.730 | Prob(JB): | 5.65e-26 |
Kurtosis: | 7.209 | Cond. No. | 7.67e+16 |
model51 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result51 = model51.fit()
result51.summary()
Dep. Variable: | base_pay_change | R-squared: | 0.095 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | -0.014 |
Method: | Least Squares | F-statistic: | 0.8693 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.565 |
Time: | 20:32:01 | Log-Likelihood: | -781.27 |
No. Observations: | 94 | AIC: | 1585. |
Df Residuals: | 83 | BIC: | 1613. |
Df Model: | 10 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 1499.1252 | 997.506 | 1.503 | 0.137 | -484.875 | 3483.125 |
race_grouping_white | -52.3007 | 1105.092 | -0.047 | 0.962 | -2250.284 | 2145.683 |
race_grouping_person_of_color | -469.6346 | 1126.458 | -0.417 | 0.678 | -2710.115 | 1770.845 |
age_group_5_25_under | 3.666e-13 | 6.68e-13 | 0.549 | 0.584 | -9.61e-13 | 1.69e-12 |
age_group_5_25to29 | -99.1252 | 340.998 | -0.291 | 0.772 | -777.356 | 579.106 |
age_group_5_30to34 | 371.9523 | 283.450 | 1.312 | 0.193 | -191.819 | 935.723 |
age_group_5_35to39 | 176.8397 | 331.634 | 0.533 | 0.595 | -482.766 | 836.446 |
age_group_5_40to44 | 701.2750 | 324.481 | 2.161 | 0.034 | 55.896 | 1346.654 |
age_group_5_45to49 | 685.5748 | 373.068 | 1.838 | 0.070 | -56.442 | 1427.591 |
age_group_5_50to54 | 149.6510 | 328.163 | 0.456 | 0.650 | -503.052 | 802.354 |
age_group_5_55to59 | 45.8178 | 338.269 | 0.135 | 0.893 | -626.985 | 718.621 |
age_group_5_60to64 | -503.3695 | 698.043 | -0.721 | 0.473 | -1891.749 | 885.010 |
age_group_5_65_over | -29.4906 | 968.266 | -0.030 | 0.976 | -1955.333 | 1896.352 |
Omnibus: | 49.612 | Durbin-Watson: | 1.991 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 143.081 |
Skew: | 1.886 | Prob(JB): | 8.52e-32 |
Kurtosis: | 7.722 | Cond. No. | 2.45e+16 |
model52 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result52 = model52.fit()
result52.summary()
Dep. Variable: | base_pay_change | R-squared: | 0.148 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.034 |
Method: | Least Squares | F-statistic: | 1.296 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.242 |
Time: | 20:32:01 | Log-Likelihood: | -778.42 |
No. Observations: | 94 | AIC: | 1581. |
Df Residuals: | 82 | BIC: | 1611. |
Df Model: | 11 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 930.5934 | 672.984 | 1.383 | 0.170 | -408.187 | 2269.374 |
gender_Female | 729.0746 | 348.541 | 2.092 | 0.040 | 35.715 | 1422.434 |
gender_Male | 201.5188 | 363.462 | 0.554 | 0.581 | -521.523 | 924.561 |
race_grouping_white | 57.5424 | 1079.680 | 0.053 | 0.958 | -2090.284 | 2205.369 |
race_grouping_person_of_color | -293.2674 | 1102.198 | -0.266 | 0.791 | -2485.890 | 1899.355 |
age_group_5_25_under | -8.07e-14 | 1.3e-13 | -0.620 | 0.537 | -3.4e-13 | 1.78e-13 |
age_group_5_25to29 | -259.6680 | 338.569 | -0.767 | 0.445 | -933.190 | 413.854 |
age_group_5_30to34 | 208.1681 | 271.233 | 0.767 | 0.445 | -331.400 | 747.736 |
age_group_5_35to39 | 185.0484 | 315.193 | 0.587 | 0.559 | -441.972 | 812.069 |
age_group_5_40to44 | 603.0917 | 308.021 | 1.958 | 0.054 | -9.660 | 1215.843 |
age_group_5_45to49 | 605.3087 | 355.536 | 1.703 | 0.092 | -101.965 | 1312.582 |
age_group_5_50to54 | 41.2333 | 311.747 | 0.132 | 0.895 | -578.931 | 661.397 |
age_group_5_55to59 | -103.7665 | 324.559 | -0.320 | 0.750 | -749.417 | 541.884 |
age_group_5_60to64 | -509.9775 | 675.448 | -0.755 | 0.452 | -1853.660 | 833.705 |
age_group_5_65_over | 161.1552 | 944.143 | 0.171 | 0.865 | -1717.046 | 2039.356 |
Omnibus: | 42.404 | Durbin-Watson: | 2.058 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 109.020 |
Skew: | 1.628 | Prob(JB): | 2.12e-24 |
Kurtosis: | 7.152 | Cond. No. | 1.49e+17 |
model53 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ gender_Female + gender_Male')
result53 = model53.fit()
result53.summary()
Dep. Variable: | performance_rating | R-squared: | 0.004 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | -0.007 |
Method: | Least Squares | F-statistic: | 0.3292 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.568 |
Time: | 20:32:01 | Log-Likelihood: | -26.919 |
No. Observations: | 92 | AIC: | 57.84 |
Df Residuals: | 90 | BIC: | 62.88 |
Df Model: | 1 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 2.3005 | 0.024 | 96.855 | 0.000 | 2.253 | 2.348 |
gender_Female | 1.1707 | 0.034 | 34.208 | 0.000 | 1.103 | 1.239 |
gender_Male | 1.1298 | 0.041 | 27.818 | 0.000 | 1.049 | 1.210 |
Omnibus: | 2.489 | Durbin-Watson: | 1.823 |
---|---|---|---|
Prob(Omnibus): | 0.288 | Jarque-Bera (JB): | 2.472 |
Skew: | 0.383 | Prob(JB): | 0.291 |
Kurtosis: | 2.757 | Cond. No. | 2.74e+15 |
model54 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ race_grouping_white + race_grouping_person_of_color')
result54 = model54.fit()
result54.summary()
Dep. Variable: | performance_rating | R-squared: | 0.005 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | -0.018 |
Method: | Least Squares | F-statistic: | 0.2113 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.810 |
Time: | 20:32:02 | Log-Likelihood: | -26.869 |
No. Observations: | 92 | AIC: | 59.74 |
Df Residuals: | 89 | BIC: | 67.30 |
Df Model: | 2 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 3.4000 | 0.329 | 10.320 | 0.000 | 2.745 | 4.055 |
race_grouping_white | 0.0755 | 0.333 | 0.227 | 0.821 | -0.585 | 0.736 |
race_grouping_person_of_color | 0.0316 | 0.334 | 0.095 | 0.925 | -0.632 | 0.695 |
Omnibus: | 2.415 | Durbin-Watson: | 1.849 |
---|---|---|---|
Prob(Omnibus): | 0.299 | Jarque-Bera (JB): | 2.410 |
Skew: | 0.374 | Prob(JB): | 0.300 |
Kurtosis: | 2.740 | Cond. No. | 20.5 |
model55 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result55 = model55.fit()
result55.summary()
Dep. Variable: | performance_rating | R-squared: | 0.007 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | -0.027 |
Method: | Least Squares | F-statistic: | 0.2059 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.892 |
Time: | 20:32:02 | Log-Likelihood: | -26.765 |
No. Observations: | 92 | AIC: | 61.53 |
Df Residuals: | 88 | BIC: | 71.62 |
Df Model: | 3 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 2.2556 | 0.222 | 10.159 | 0.000 | 1.814 | 2.697 |
gender_Female | 1.1444 | 0.113 | 10.121 | 0.000 | 0.920 | 1.369 |
gender_Male | 1.1112 | 0.121 | 9.188 | 0.000 | 0.871 | 1.352 |
race_grouping_white | 0.0842 | 0.335 | 0.252 | 0.802 | -0.581 | 0.749 |
race_grouping_person_of_color | 0.0482 | 0.337 | 0.143 | 0.887 | -0.622 | 0.719 |
Omnibus: | 2.301 | Durbin-Watson: | 1.833 |
---|---|---|---|
Prob(Omnibus): | 0.316 | Jarque-Bera (JB): | 2.285 |
Skew: | 0.368 | Prob(JB): | 0.319 |
Kurtosis: | 2.764 | Cond. No. | 3.22e+15 |
model56 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result56 = model56.fit()
result56.summary()
Dep. Variable: | performance_rating | R-squared: | 0.139 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.045 |
Method: | Least Squares | F-statistic: | 1.477 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.170 |
Time: | 20:32:02 | Log-Likelihood: | -20.177 |
No. Observations: | 92 | AIC: | 60.35 |
Df Residuals: | 82 | BIC: | 85.57 |
Df Model: | 9 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 2.1485 | 0.032 | 67.444 | 0.000 | 2.085 | 2.212 |
gender_Female | 1.0953 | 0.038 | 28.512 | 0.000 | 1.019 | 1.172 |
gender_Male | 1.0532 | 0.041 | 25.866 | 0.000 | 0.972 | 1.134 |
age_group_5_25_under | 1.693e-16 | 9.48e-17 | 1.787 | 0.078 | -1.92e-17 | 3.58e-16 |
age_group_5_25to29 | 0.0547 | 0.103 | 0.529 | 0.598 | -0.151 | 0.260 |
age_group_5_30to34 | 0.2218 | 0.081 | 2.748 | 0.007 | 0.061 | 0.382 |
age_group_5_35to39 | 0.2141 | 0.095 | 2.245 | 0.027 | 0.024 | 0.404 |
age_group_5_40to44 | 0.2725 | 0.092 | 2.954 | 0.004 | 0.089 | 0.456 |
age_group_5_45to49 | 0.2083 | 0.106 | 1.964 | 0.053 | -0.003 | 0.419 |
age_group_5_50to54 | 0.1263 | 0.092 | 1.370 | 0.175 | -0.057 | 0.310 |
age_group_5_55to59 | 0.4751 | 0.096 | 4.952 | 0.000 | 0.284 | 0.666 |
age_group_5_60to64 | 0.0773 | 0.207 | 0.374 | 0.709 | -0.334 | 0.488 |
age_group_5_65_over | 0.4983 | 0.291 | 1.713 | 0.091 | -0.080 | 1.077 |
Omnibus: | 4.627 | Durbin-Watson: | 1.906 |
---|---|---|---|
Prob(Omnibus): | 0.099 | Jarque-Bera (JB): | 4.258 |
Skew: | 0.526 | Prob(JB): | 0.119 |
Kurtosis: | 3.077 | Cond. No. | 8.80e+16 |
model57 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result57 = model57.fit()
result57.summary()
Dep. Variable: | performance_rating | R-squared: | 0.140 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.033 |
Method: | Least Squares | F-statistic: | 1.313 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.237 |
Time: | 20:32:02 | Log-Likelihood: | -20.174 |
No. Observations: | 92 | AIC: | 62.35 |
Df Residuals: | 81 | BIC: | 90.09 |
Df Model: | 10 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 3.2369 | 0.306 | 10.589 | 0.000 | 2.629 | 3.845 |
race_grouping_white | -0.1183 | 0.339 | -0.349 | 0.728 | -0.792 | 0.555 |
race_grouping_person_of_color | -0.1539 | 0.346 | -0.445 | 0.657 | -0.841 | 0.534 |
age_group_5_25_under | 1.298e-15 | 3.56e-16 | 3.649 | 0.000 | 5.9e-16 | 2.01e-15 |
age_group_5_25to29 | 0.1631 | 0.105 | 1.557 | 0.123 | -0.045 | 0.371 |
age_group_5_30to34 | 0.3452 | 0.088 | 3.904 | 0.000 | 0.169 | 0.521 |
age_group_5_35to39 | 0.3414 | 0.105 | 3.240 | 0.002 | 0.132 | 0.551 |
age_group_5_40to44 | 0.3978 | 0.099 | 3.999 | 0.000 | 0.200 | 0.596 |
age_group_5_45to49 | 0.3384 | 0.114 | 2.958 | 0.004 | 0.111 | 0.566 |
age_group_5_50to54 | 0.2571 | 0.101 | 2.555 | 0.012 | 0.057 | 0.457 |
age_group_5_55to59 | 0.5956 | 0.104 | 5.737 | 0.000 | 0.389 | 0.802 |
age_group_5_60to64 | 0.1814 | 0.214 | 0.847 | 0.399 | -0.245 | 0.607 |
age_group_5_65_over | 0.6169 | 0.297 | 2.078 | 0.041 | 0.026 | 1.208 |
Omnibus: | 4.713 | Durbin-Watson: | 1.923 |
---|---|---|---|
Prob(Omnibus): | 0.095 | Jarque-Bera (JB): | 4.379 |
Skew: | 0.534 | Prob(JB): | 0.112 |
Kurtosis: | 3.056 | Cond. No. | 9.97e+16 |
model58 = sm.ols(data=merit_raises_combined_salaried_regression, formula = 'performance_rating ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result58 = model58.fit()
result58.summary()
Dep. Variable: | performance_rating | R-squared: | 0.142 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.024 |
Method: | Least Squares | F-statistic: | 1.205 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.297 |
Time: | 20:32:02 | Log-Likelihood: | -20.034 |
No. Observations: | 92 | AIC: | 64.07 |
Df Residuals: | 80 | BIC: | 94.33 |
Df Model: | 11 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 2.2248 | 0.212 | 10.477 | 0.000 | 1.802 | 2.647 |
gender_Female | 1.1308 | 0.110 | 10.285 | 0.000 | 0.912 | 1.350 |
gender_Male | 1.0941 | 0.115 | 9.517 | 0.000 | 0.865 | 1.323 |
race_grouping_white | -0.1108 | 0.341 | -0.325 | 0.746 | -0.788 | 0.567 |
race_grouping_person_of_color | -0.1404 | 0.348 | -0.403 | 0.688 | -0.833 | 0.553 |
age_group_5_25_under | -1.047e-17 | 1.66e-16 | -0.063 | 0.950 | -3.41e-16 | 3.2e-16 |
age_group_5_25to29 | 0.0444 | 0.107 | 0.415 | 0.679 | -0.168 | 0.257 |
age_group_5_30to34 | 0.2267 | 0.087 | 2.609 | 0.011 | 0.054 | 0.400 |
age_group_5_35to39 | 0.2321 | 0.103 | 2.254 | 0.027 | 0.027 | 0.437 |
age_group_5_40to44 | 0.2830 | 0.097 | 2.911 | 0.005 | 0.090 | 0.476 |
age_group_5_45to49 | 0.2246 | 0.112 | 2.001 | 0.049 | 0.001 | 0.448 |
age_group_5_50to54 | 0.1414 | 0.098 | 1.436 | 0.155 | -0.055 | 0.337 |
age_group_5_55to59 | 0.4776 | 0.102 | 4.661 | 0.000 | 0.274 | 0.682 |
age_group_5_60to64 | 0.0735 | 0.213 | 0.345 | 0.731 | -0.351 | 0.498 |
age_group_5_65_over | 0.5215 | 0.298 | 1.751 | 0.084 | -0.071 | 1.114 |
Omnibus: | 4.573 | Durbin-Watson: | 1.918 |
---|---|---|---|
Prob(Omnibus): | 0.102 | Jarque-Bera (JB): | 4.209 |
Skew: | 0.523 | Prob(JB): | 0.122 |
Kurtosis: | 3.071 | Cond. No. | 6.67e+16 |
commercial_hourly_regression = commercial_hourly[['department','gender','race_ethnicity','current_base_pay','job_profile_current','cost_center_current','2015_annual_performance_rating','2016_annual_performance_rating','2017_annual_performance_rating','2018_annual_performance_rating','age','years_of_service','age_group_5','years_of_service_grouped','dept','desk','tier','race_grouping']]
commercial_hourly_regression = pd.get_dummies(commercial_hourly_regression, columns=['gender','race_ethnicity','age_group_5','years_of_service_grouped','dept','desk','tier','race_grouping'])
commercial_hourly_regression = commercial_hourly_regression.rename(columns={'race_grouping_person of color':'race_grouping_person_of_color','age_group_5_<25':'age_group_5_25_under','age_group_5_25-29':'age_group_5_25to29','age_group_5_30-34':'age_group_5_30to34','age_group_5_35-39':'age_group_5_35to39','age_group_5_40-44':'age_group_5_40to44','age_group_5_45-49':'age_group_5_45to49','age_group_5_50-54':'age_group_5_50to54','age_group_5_55-59':'age_group_5_55to59','age_group_5_60-64':'age_group_5_60to64','age_group_5_65+':'age_group_5_65_over','tier_Tier 1':'tier_Tier_1','tier_Tier 2':'tier_Tier_2','tier_Tier 3':'tier_Tier_3','tier_Tier 4':'tier_Tier_4','years_of_service_grouped_0':'years_of_service_grouped_0','years_of_service_grouped_1-2':'years_of_service_grouped_1to2','years_of_service_grouped_3-5':'years_of_service_grouped_3to5','years_of_service_grouped_6-10':'years_of_service_grouped_6to10','years_of_service_grouped_11-15':'years_of_service_grouped_11to15','years_of_service_grouped_16-20':'years_of_service_grouped_16to20','years_of_service_grouped_21-25':'years_of_service_grouped_21to25','years_of_service_grouped_25+':'years_of_service_grouped_25_over'})
import statsmodels.formula.api as sm
model59 = sm.ols(data=commercial_hourly_regression, formula = 'current_base_pay ~ gender_Female + gender_Male')
result59 = model59.fit()
result59.summary()
Dep. Variable: | current_base_pay | R-squared: | 0.091 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.084 |
Method: | Least Squares | F-statistic: | 13.47 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.000348 |
Time: | 20:32:02 | Log-Likelihood: | -435.67 |
No. Observations: | 137 | AIC: | 875.3 |
Df Residuals: | 135 | BIC: | 881.2 |
Df Model: | 1 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 19.9208 | 0.335 | 59.396 | 0.000 | 19.258 | 20.584 |
gender_Female | 11.8069 | 0.515 | 22.927 | 0.000 | 10.788 | 12.825 |
gender_Male | 8.1139 | 0.545 | 14.883 | 0.000 | 7.036 | 9.192 |
Omnibus: | 7.871 | Durbin-Watson: | 1.510 |
---|---|---|---|
Prob(Omnibus): | 0.020 | Jarque-Bera (JB): | 7.596 |
Skew: | 0.511 | Prob(JB): | 0.0224 |
Kurtosis: | 3.536 | Cond. No. | 2.25e+15 |
model60 = sm.ols(data=commercial_hourly_regression, formula = 'current_base_pay ~ race_grouping_white + race_grouping_person_of_color')
result60 = model60.fit()
result60.summary()
Dep. Variable: | current_base_pay | R-squared: | 0.101 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.087 |
Method: | Least Squares | F-statistic: | 7.504 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.000814 |
Time: | 20:32:02 | Log-Likelihood: | -434.91 |
No. Observations: | 137 | AIC: | 875.8 |
Df Residuals: | 134 | BIC: | 884.6 |
Df Model: | 2 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 22.9600 | 4.138 | 5.549 | 0.000 | 14.777 | 31.143 |
race_grouping_white | 9.8456 | 4.237 | 2.324 | 0.022 | 1.465 | 18.226 |
race_grouping_person_of_color | 6.0483 | 4.181 | 1.447 | 0.150 | -2.222 | 14.318 |
Omnibus: | 15.499 | Durbin-Watson: | 1.380 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 17.712 |
Skew: | 0.742 | Prob(JB): | 0.000143 |
Kurtosis: | 3.949 | Cond. No. | 18.1 |
model61 = sm.ols(data=commercial_hourly_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result61 = model61.fit()
result61.summary()
Dep. Variable: | current_base_pay | R-squared: | 0.181 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.162 |
Method: | Least Squares | F-statistic: | 9.778 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 7.12e-06 |
Time: | 20:32:02 | Log-Likelihood: | -428.53 |
No. Observations: | 137 | AIC: | 865.1 |
Df Residuals: | 133 | BIC: | 876.7 |
Df Model: | 3 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 16.4729 | 2.662 | 6.187 | 0.000 | 11.207 | 21.739 |
gender_Female | 9.9858 | 1.471 | 6.787 | 0.000 | 7.075 | 12.896 |
gender_Male | 6.4871 | 1.360 | 4.768 | 0.000 | 3.796 | 9.178 |
race_grouping_white | 7.8829 | 4.096 | 1.925 | 0.056 | -0.219 | 15.985 |
race_grouping_person_of_color | 4.1128 | 4.042 | 1.018 | 0.311 | -3.882 | 12.107 |
Omnibus: | 11.810 | Durbin-Watson: | 1.592 |
---|---|---|---|
Prob(Omnibus): | 0.003 | Jarque-Bera (JB): | 12.735 |
Skew: | 0.617 | Prob(JB): | 0.00172 |
Kurtosis: | 3.842 | Cond. No. | 2.77e+15 |
new_commercial_hourly_regression = pd.DataFrame({'gender_Female': [1,0,1,0], 'gender_Male': [0,1,0,1], 'race_grouping_white': [1,1,0,0], 'race_grouping_person_of_color': [0,0,1,1], 'age': [40,40,40,40]})
new_commercial_hourly_regression['predicted'] = result61.predict(new_commercial_hourly_regression)
new_commercial_hourly_regression
gender_Female | gender_Male | race_grouping_white | race_grouping_person_of_color | age | predicted | |
---|---|---|---|---|---|---|
0 | 1 | 0 | 1 | 0 | 40 | 34.34 |
1 | 0 | 1 | 1 | 0 | 40 | 30.84 |
2 | 1 | 0 | 0 | 1 | 40 | 30.57 |
3 | 0 | 1 | 0 | 1 | 40 | 27.07 |
model62 = sm.ols(data=commercial_hourly_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result62 = model62.fit()
result62.summary()
Dep. Variable: | current_base_pay | R-squared: | 0.156 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.089 |
Method: | Least Squares | F-statistic: | 2.323 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.0152 |
Time: | 20:32:02 | Log-Likelihood: | -430.59 |
No. Observations: | 137 | AIC: | 883.2 |
Df Residuals: | 126 | BIC: | 915.3 |
Df Model: | 10 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 18.5330 | 0.342 | 54.261 | 0.000 | 17.857 | 19.209 |
gender_Female | 10.9683 | 0.533 | 20.572 | 0.000 | 9.913 | 12.023 |
gender_Male | 7.5647 | 0.558 | 13.550 | 0.000 | 6.460 | 8.669 |
age_group_5_25_under | 0.4121 | 2.083 | 0.198 | 0.843 | -3.710 | 4.534 |
age_group_5_25to29 | 3.4150 | 1.125 | 3.037 | 0.003 | 1.189 | 5.641 |
age_group_5_30to34 | 4.1637 | 1.557 | 2.675 | 0.008 | 1.083 | 7.244 |
age_group_5_35to39 | 4.5093 | 1.933 | 2.333 | 0.021 | 0.685 | 8.334 |
age_group_5_40to44 | 2.0586 | 1.619 | 1.272 | 0.206 | -1.145 | 5.262 |
age_group_5_45to49 | 2.5494 | 1.313 | 1.942 | 0.054 | -0.048 | 5.147 |
age_group_5_50to54 | -0.8190 | 1.940 | -0.422 | 0.674 | -4.657 | 3.019 |
age_group_5_55to59 | 0.6862 | 1.313 | 0.523 | 0.602 | -1.911 | 3.284 |
age_group_5_60to64 | -0.0547 | 1.671 | -0.033 | 0.974 | -3.362 | 3.253 |
age_group_5_65_over | 1.6122 | 1.605 | 1.004 | 0.317 | -1.565 | 4.789 |
Omnibus: | 5.241 | Durbin-Watson: | 1.586 |
---|---|---|---|
Prob(Omnibus): | 0.073 | Jarque-Bera (JB): | 4.725 |
Skew: | 0.400 | Prob(JB): | 0.0942 |
Kurtosis: | 3.434 | Cond. No. | 4.01e+15 |
model63 = sm.ols(data=commercial_hourly_regression, formula = 'current_base_pay ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result63 = model63.fit()
result63.summary()
Dep. Variable: | current_base_pay | R-squared: | 0.185 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.113 |
Method: | Least Squares | F-statistic: | 2.583 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.00547 |
Time: | 20:32:02 | Log-Likelihood: | -428.16 |
No. Observations: | 137 | AIC: | 880.3 |
Df Residuals: | 125 | BIC: | 915.4 |
Df Model: | 11 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 20.3525 | 3.811 | 5.341 | 0.000 | 12.810 | 27.895 |
race_grouping_white | 10.2349 | 4.254 | 2.406 | 0.018 | 1.817 | 18.653 |
race_grouping_person_of_color | 6.3766 | 4.231 | 1.507 | 0.134 | -1.998 | 14.751 |
age_group_5_25_under | 1.0445 | 2.080 | 0.502 | 0.617 | -3.073 | 5.162 |
age_group_5_25to29 | 3.4080 | 1.152 | 2.959 | 0.004 | 1.129 | 5.687 |
age_group_5_30to34 | 4.9982 | 1.588 | 3.148 | 0.002 | 1.856 | 8.141 |
age_group_5_35to39 | 5.0974 | 1.965 | 2.594 | 0.011 | 1.208 | 8.987 |
age_group_5_40to44 | 1.8070 | 1.587 | 1.139 | 0.257 | -1.333 | 4.947 |
age_group_5_45to49 | 3.1001 | 1.371 | 2.261 | 0.025 | 0.387 | 5.814 |
age_group_5_50to54 | -1.1387 | 1.956 | -0.582 | 0.562 | -5.010 | 2.733 |
age_group_5_55to59 | 0.8308 | 1.362 | 0.610 | 0.543 | -1.865 | 3.527 |
age_group_5_60to64 | -0.1914 | 1.702 | -0.112 | 0.911 | -3.560 | 3.177 |
age_group_5_65_over | 1.3965 | 1.640 | 0.851 | 0.396 | -1.850 | 4.643 |
Omnibus: | 9.273 | Durbin-Watson: | 1.467 |
---|---|---|---|
Prob(Omnibus): | 0.010 | Jarque-Bera (JB): | 9.459 |
Skew: | 0.533 | Prob(JB): | 0.00883 |
Kurtosis: | 3.722 | Cond. No. | 4.70e+15 |
model64 = sm.ols(data=commercial_hourly_regression, formula = 'current_base_pay ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result64 = model64.fit()
result64.summary()
Dep. Variable: | current_base_pay | R-squared: | 0.251 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.179 |
Method: | Least Squares | F-statistic: | 3.466 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.000202 |
Time: | 20:32:02 | Log-Likelihood: | -422.37 |
No. Observations: | 137 | AIC: | 870.7 |
Df Residuals: | 124 | BIC: | 908.7 |
Df Model: | 12 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 14.9662 | 2.539 | 5.895 | 0.000 | 9.941 | 19.991 |
gender_Female | 9.1237 | 1.416 | 6.445 | 0.000 | 6.322 | 11.926 |
gender_Male | 5.8425 | 1.308 | 4.466 | 0.000 | 3.253 | 8.432 |
race_grouping_white | 8.5728 | 4.125 | 2.078 | 0.040 | 0.409 | 16.737 |
race_grouping_person_of_color | 4.6437 | 4.106 | 1.131 | 0.260 | -3.484 | 12.771 |
age_group_5_25_under | -0.5216 | 2.000 | -0.261 | 0.795 | -4.479 | 3.436 |
age_group_5_25to29 | 2.4806 | 1.094 | 2.266 | 0.025 | 0.314 | 4.647 |
age_group_5_30to34 | 3.9869 | 1.506 | 2.648 | 0.009 | 1.006 | 6.968 |
age_group_5_35to39 | 4.7246 | 1.869 | 2.527 | 0.013 | 1.025 | 8.424 |
age_group_5_40to44 | 1.8219 | 1.538 | 1.184 | 0.239 | -1.223 | 4.867 |
age_group_5_45to49 | 2.6386 | 1.291 | 2.044 | 0.043 | 0.083 | 5.194 |
age_group_5_50to54 | -1.1102 | 1.870 | -0.594 | 0.554 | -4.811 | 2.591 |
age_group_5_55to59 | 0.3619 | 1.283 | 0.282 | 0.778 | -2.177 | 2.901 |
age_group_5_60to64 | -0.4255 | 1.617 | -0.263 | 0.793 | -3.626 | 2.775 |
age_group_5_65_over | 1.0089 | 1.555 | 0.649 | 0.518 | -2.068 | 4.086 |
Omnibus: | 8.226 | Durbin-Watson: | 1.675 |
---|---|---|---|
Prob(Omnibus): | 0.016 | Jarque-Bera (JB): | 8.103 |
Skew: | 0.506 | Prob(JB): | 0.0174 |
Kurtosis: | 3.628 | Cond. No. | 5.47e+15 |
merit_raises_combined_hourly_regression = merit_raises_combined[(merit_raises_combined['dept'] == 'Commercial') & (merit_raises_combined['pay_rate_type'] == 'Hourly')]
merit_raises_combined_hourly_regression = pd.get_dummies(merit_raises_combined_hourly_regression, columns=['gender','race_grouping','age_group_5'])
merit_raises_combined_hourly_regression = merit_raises_combined_hourly_regression.rename(columns={'race_grouping_person of color':'race_grouping_person_of_color','age_group_5_<25':'age_group_5_25_under','age_group_5_25-29':'age_group_5_25to29','age_group_5_30-34':'age_group_5_30to34','age_group_5_35-39':'age_group_5_35to39','age_group_5_40-44':'age_group_5_40to44','age_group_5_45-49':'age_group_5_45to49','age_group_5_50-54':'age_group_5_50to54','age_group_5_55-59':'age_group_5_55to59','age_group_5_60-64':'age_group_5_60to64','age_group_5_65+':'age_group_5_65_over'})
model65 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ gender_Female + gender_Male')
result65 = model65.fit()
result65.summary()
Dep. Variable: | base_pay_change | R-squared: | 0.057 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.049 |
Method: | Least Squares | F-statistic: | 7.107 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.00101 |
Time: | 20:32:02 | Log-Likelihood: | 34.616 |
No. Observations: | 240 | AIC: | -63.23 |
Df Residuals: | 237 | BIC: | -52.79 |
Df Model: | 2 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | -4.561e+11 | 3.07e+12 | -0.148 | 0.882 | -6.51e+12 | 5.6e+12 |
gender_Female | 4.561e+11 | 3.07e+12 | 0.148 | 0.882 | -5.6e+12 | 6.51e+12 |
gender_Male | 4.561e+11 | 3.07e+12 | 0.148 | 0.882 | -5.6e+12 | 6.51e+12 |
Omnibus: | 112.160 | Durbin-Watson: | 1.893 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 489.149 |
Skew: | 1.908 | Prob(JB): | 6.06e-107 |
Kurtosis: | 8.862 | Cond. No. | 4.79e+14 |
model66 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ race_grouping_white + race_grouping_person_of_color')
result66 = model66.fit()
result66.summary()
Dep. Variable: | base_pay_change | R-squared: | 0.038 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.030 |
Method: | Least Squares | F-statistic: | 4.669 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.0103 |
Time: | 20:32:02 | Log-Likelihood: | 32.263 |
No. Observations: | 240 | AIC: | -58.53 |
Df Residuals: | 237 | BIC: | -48.08 |
Df Model: | 2 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 0.2777 | 0.010 | 27.077 | 0.000 | 0.258 | 0.298 |
race_grouping_white | 0.1859 | 0.018 | 10.170 | 0.000 | 0.150 | 0.222 |
race_grouping_person_of_color | 0.0919 | 0.014 | 6.628 | 0.000 | 0.065 | 0.119 |
Omnibus: | 106.312 | Durbin-Watson: | 1.948 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 402.969 |
Skew: | 1.857 | Prob(JB): | 3.14e-88 |
Kurtosis: | 8.148 | Cond. No. | 1.02e+15 |
model67 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result67 = model67.fit()
result67.summary()
Dep. Variable: | base_pay_change | R-squared: | 0.100 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.088 |
Method: | Least Squares | F-statistic: | 8.726 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 1.64e-05 |
Time: | 20:32:02 | Log-Likelihood: | 40.249 |
No. Observations: | 240 | AIC: | -72.50 |
Df Residuals: | 236 | BIC: | -58.58 |
Df Model: | 3 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | -2.4e+11 | 2.61e+12 | -0.092 | 0.927 | -5.39e+12 | 4.91e+12 |
gender_Female | 2.907e+11 | 3.16e+12 | 0.092 | 0.927 | -5.94e+12 | 6.52e+12 |
gender_Male | 2.907e+11 | 3.16e+12 | 0.092 | 0.927 | -5.94e+12 | 6.52e+12 |
race_grouping_white | -5.07e+10 | 5.52e+11 | -0.092 | 0.927 | -1.14e+12 | 1.04e+12 |
race_grouping_person_of_color | -5.07e+10 | 5.52e+11 | -0.092 | 0.927 | -1.14e+12 | 1.04e+12 |
Omnibus: | 94.745 | Durbin-Watson: | 1.980 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 324.494 |
Skew: | 1.670 | Prob(JB): | 3.44e-71 |
Kurtosis: | 7.614 | Cond. No. | 1.04e+15 |
new_reason_for_change_combined_regression = pd.DataFrame({'gender_Female': [1,0,1,0], 'gender_Male': [0,1,0,1], 'race_grouping_white': [1,1,0,0], 'race_grouping_person_of_color': [0,0,1,1]})
new_reason_for_change_combined_regression['predicted'] = result67.predict(new_reason_for_change_combined_regression)
new_reason_for_change_combined_regression
gender_Female | gender_Male | race_grouping_white | race_grouping_person_of_color | predicted | |
---|---|---|---|---|---|
0 | 1 | 0 | 1 | 0 | 0.52 |
1 | 0 | 1 | 1 | 0 | 0.42 |
2 | 1 | 0 | 0 | 1 | 0.42 |
3 | 0 | 1 | 0 | 1 | 0.32 |
model68 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result68 = model68.fit()
result68.summary()
Dep. Variable: | base_pay_change | R-squared: | 0.136 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.098 |
Method: | Least Squares | F-statistic: | 3.594 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.000188 |
Time: | 20:32:02 | Log-Likelihood: | 45.120 |
No. Observations: | 240 | AIC: | -68.24 |
Df Residuals: | 229 | BIC: | -29.95 |
Df Model: | 10 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | -2.181e+11 | 2.76e+12 | -0.079 | 0.937 | -5.66e+12 | 5.23e+12 |
gender_Female | 2.386e+11 | 3.02e+12 | 0.079 | 0.937 | -5.72e+12 | 6.19e+12 |
gender_Male | 2.386e+11 | 3.02e+12 | 0.079 | 0.937 | -5.72e+12 | 6.19e+12 |
age_group_5_25_under | -2.051e+10 | 2.6e+11 | -0.079 | 0.937 | -5.33e+11 | 4.92e+11 |
age_group_5_25to29 | -2.051e+10 | 2.6e+11 | -0.079 | 0.937 | -5.33e+11 | 4.92e+11 |
age_group_5_30to34 | -2.051e+10 | 2.6e+11 | -0.079 | 0.937 | -5.33e+11 | 4.92e+11 |
age_group_5_35to39 | -2.051e+10 | 2.6e+11 | -0.079 | 0.937 | -5.33e+11 | 4.92e+11 |
age_group_5_40to44 | -2.051e+10 | 2.6e+11 | -0.079 | 0.937 | -5.33e+11 | 4.92e+11 |
age_group_5_45to49 | -2.051e+10 | 2.6e+11 | -0.079 | 0.937 | -5.33e+11 | 4.92e+11 |
age_group_5_50to54 | -2.051e+10 | 2.6e+11 | -0.079 | 0.937 | -5.33e+11 | 4.92e+11 |
age_group_5_55to59 | -2.051e+10 | 2.6e+11 | -0.079 | 0.937 | -5.33e+11 | 4.92e+11 |
age_group_5_60to64 | -2.051e+10 | 2.6e+11 | -0.079 | 0.937 | -5.33e+11 | 4.92e+11 |
age_group_5_65_over | -2.051e+10 | 2.6e+11 | -0.079 | 0.937 | -5.33e+11 | 4.92e+11 |
Omnibus: | 100.582 | Durbin-Watson: | 1.864 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 390.178 |
Skew: | 1.727 | Prob(JB): | 1.88e-85 |
Kurtosis: | 8.204 | Cond. No. | 2.82e+15 |
model69 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result69 = model69.fit()
result69.summary()
Dep. Variable: | base_pay_change | R-squared: | 0.119 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.081 |
Method: | Least Squares | F-statistic: | 3.105 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.000988 |
Time: | 20:32:02 | Log-Likelihood: | 42.885 |
No. Observations: | 240 | AIC: | -63.77 |
Df Residuals: | 229 | BIC: | -25.48 |
Df Model: | 10 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 1.727e+12 | 4.54e+12 | 0.381 | 0.704 | -7.21e+12 | 1.07e+13 |
race_grouping_white | -2.165e+12 | 5.69e+12 | -0.381 | 0.704 | -1.34e+13 | 9.04e+12 |
race_grouping_person_of_color | -2.165e+12 | 5.69e+12 | -0.381 | 0.704 | -1.34e+13 | 9.04e+12 |
age_group_5_25_under | 4.382e+11 | 1.15e+12 | 0.381 | 0.704 | -1.83e+12 | 2.71e+12 |
age_group_5_25to29 | 4.382e+11 | 1.15e+12 | 0.381 | 0.704 | -1.83e+12 | 2.71e+12 |
age_group_5_30to34 | 4.382e+11 | 1.15e+12 | 0.381 | 0.704 | -1.83e+12 | 2.71e+12 |
age_group_5_35to39 | 4.382e+11 | 1.15e+12 | 0.381 | 0.704 | -1.83e+12 | 2.71e+12 |
age_group_5_40to44 | 4.382e+11 | 1.15e+12 | 0.381 | 0.704 | -1.83e+12 | 2.71e+12 |
age_group_5_45to49 | 4.382e+11 | 1.15e+12 | 0.381 | 0.704 | -1.83e+12 | 2.71e+12 |
age_group_5_50to54 | 4.382e+11 | 1.15e+12 | 0.381 | 0.704 | -1.83e+12 | 2.71e+12 |
age_group_5_55to59 | 4.382e+11 | 1.15e+12 | 0.381 | 0.704 | -1.83e+12 | 2.71e+12 |
age_group_5_60to64 | 4.382e+11 | 1.15e+12 | 0.381 | 0.704 | -1.83e+12 | 2.71e+12 |
age_group_5_65_over | 4.382e+11 | 1.15e+12 | 0.381 | 0.704 | -1.83e+12 | 2.71e+12 |
Omnibus: | 94.296 | Durbin-Watson: | 1.922 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 322.520 |
Skew: | 1.662 | Prob(JB): | 9.24e-71 |
Kurtosis: | 7.605 | Cond. No. | 3.55e+15 |
model70 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'base_pay_change ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result70 = model70.fit()
result70.summary()
Dep. Variable: | base_pay_change | R-squared: | 0.166 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.125 |
Method: | Least Squares | F-statistic: | 4.113 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 1.57e-05 |
Time: | 20:32:02 | Log-Likelihood: | 49.350 |
No. Observations: | 240 | AIC: | -74.70 |
Df Residuals: | 228 | BIC: | -32.93 |
Df Model: | 11 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | -4.648e+11 | 2.37e+12 | -0.196 | 0.845 | -5.14e+12 | 4.21e+12 |
gender_Female | 6.083e+11 | 3.1e+12 | 0.196 | 0.845 | -5.51e+12 | 6.72e+12 |
gender_Male | 6.083e+11 | 3.1e+12 | 0.196 | 0.845 | -5.51e+12 | 6.72e+12 |
race_grouping_white | -1.043e+11 | 5.32e+11 | -0.196 | 0.845 | -1.15e+12 | 9.45e+11 |
race_grouping_person_of_color | -1.043e+11 | 5.32e+11 | -0.196 | 0.845 | -1.15e+12 | 9.45e+11 |
age_group_5_25_under | -3.917e+10 | 2e+11 | -0.196 | 0.845 | -4.33e+11 | 3.55e+11 |
age_group_5_25to29 | -3.917e+10 | 2e+11 | -0.196 | 0.845 | -4.33e+11 | 3.55e+11 |
age_group_5_30to34 | -3.917e+10 | 2e+11 | -0.196 | 0.845 | -4.33e+11 | 3.55e+11 |
age_group_5_35to39 | -3.917e+10 | 2e+11 | -0.196 | 0.845 | -4.33e+11 | 3.55e+11 |
age_group_5_40to44 | -3.917e+10 | 2e+11 | -0.196 | 0.845 | -4.33e+11 | 3.55e+11 |
age_group_5_45to49 | -3.917e+10 | 2e+11 | -0.196 | 0.845 | -4.33e+11 | 3.55e+11 |
age_group_5_50to54 | -3.917e+10 | 2e+11 | -0.196 | 0.845 | -4.33e+11 | 3.55e+11 |
age_group_5_55to59 | -3.917e+10 | 2e+11 | -0.196 | 0.845 | -4.33e+11 | 3.55e+11 |
age_group_5_60to64 | -3.917e+10 | 2e+11 | -0.196 | 0.845 | -4.33e+11 | 3.55e+11 |
age_group_5_65_over | -3.917e+10 | 2e+11 | -0.196 | 0.845 | -4.33e+11 | 3.55e+11 |
Omnibus: | 88.933 | Durbin-Watson: | 1.924 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 290.286 |
Skew: | 1.576 | Prob(JB): | 9.23e-64 |
Kurtosis: | 7.369 | Cond. No. | 3.98e+15 |
model71 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ gender_Female + gender_Male')
result71 = model71.fit()
result71.summary()
Dep. Variable: | performance_rating | R-squared: | -0.054 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | -0.063 |
Method: | Least Squares | F-statistic: | -6.036 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 1.00 |
Time: | 20:32:03 | Log-Likelihood: | -20.790 |
No. Observations: | 239 | AIC: | 47.58 |
Df Residuals: | 236 | BIC: | 58.01 |
Df Model: | 2 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 4.178e+12 | 5.09e+12 | 0.821 | 0.412 | -5.85e+12 | 1.42e+13 |
gender_Female | -4.178e+12 | 5.09e+12 | -0.821 | 0.412 | -1.42e+13 | 5.85e+12 |
gender_Male | -4.178e+12 | 5.09e+12 | -0.821 | 0.412 | -1.42e+13 | 5.85e+12 |
Omnibus: | 14.859 | Durbin-Watson: | 1.508 |
---|---|---|---|
Prob(Omnibus): | 0.001 | Jarque-Bera (JB): | 16.505 |
Skew: | 0.642 | Prob(JB): | 0.000261 |
Kurtosis: | 2.907 | Cond. No. | 6.28e+14 |
model72 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ race_grouping_white + race_grouping_person_of_color')
result72 = model72.fit()
result72.summary()
Dep. Variable: | performance_rating | R-squared: | -0.023 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | -0.031 |
Method: | Least Squares | F-statistic: | -2.609 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 1.00 |
Time: | 20:32:03 | Log-Likelihood: | -17.186 |
No. Observations: | 239 | AIC: | 40.37 |
Df Residuals: | 236 | BIC: | 50.80 |
Df Model: | 2 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 5.618e+12 | 5.04e+12 | 1.115 | 0.266 | -4.3e+12 | 1.55e+13 |
race_grouping_white | -5.618e+12 | 5.04e+12 | -1.115 | 0.266 | -1.55e+13 | 4.3e+12 |
race_grouping_person_of_color | -5.618e+12 | 5.04e+12 | -1.115 | 0.266 | -1.55e+13 | 4.3e+12 |
Omnibus: | 13.647 | Durbin-Watson: | 1.601 |
---|---|---|---|
Prob(Omnibus): | 0.001 | Jarque-Bera (JB): | 14.973 |
Skew: | 0.604 | Prob(JB): | 0.000560 |
Kurtosis: | 2.788 | Cond. No. | 6.59e+14 |
model73 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color')
result73 = model73.fit()
result73.summary()
Dep. Variable: | performance_rating | R-squared: | 0.069 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.057 |
Method: | Least Squares | F-statistic: | 5.825 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.000744 |
Time: | 20:32:03 | Log-Likelihood: | -5.9430 |
No. Observations: | 239 | AIC: | 19.89 |
Df Residuals: | 235 | BIC: | 33.79 |
Df Model: | 3 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 3.635e+12 | 4.99e+12 | 0.729 | 0.467 | -6.19e+12 | 1.35e+13 |
gender_Female | -1.775e+12 | 2.43e+12 | -0.729 | 0.467 | -6.57e+12 | 3.02e+12 |
gender_Male | -1.775e+12 | 2.43e+12 | -0.729 | 0.467 | -6.57e+12 | 3.02e+12 |
race_grouping_white | -1.861e+12 | 2.55e+12 | -0.729 | 0.467 | -6.89e+12 | 3.17e+12 |
race_grouping_person_of_color | -1.861e+12 | 2.55e+12 | -0.729 | 0.467 | -6.89e+12 | 3.17e+12 |
Omnibus: | 16.513 | Durbin-Watson: | 1.714 |
---|---|---|---|
Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 18.001 |
Skew: | 0.661 | Prob(JB): | 0.000123 |
Kurtosis: | 3.248 | Cond. No. | 1.72e+15 |
model74 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ gender_Female + gender_Male + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result74 = model74.fit()
result74.summary()
Dep. Variable: | performance_rating | R-squared: | 0.015 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | -0.028 |
Method: | Least Squares | F-statistic: | 0.3433 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.968 |
Time: | 20:32:03 | Log-Likelihood: | -12.729 |
No. Observations: | 239 | AIC: | 47.46 |
Df Residuals: | 228 | BIC: | 85.70 |
Df Model: | 10 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 4.753e+12 | 4.48e+12 | 1.061 | 0.290 | -4.07e+12 | 1.36e+13 |
gender_Female | -5.392e+12 | 5.08e+12 | -1.061 | 0.290 | -1.54e+13 | 4.62e+12 |
gender_Male | -5.392e+12 | 5.08e+12 | -1.061 | 0.290 | -1.54e+13 | 4.62e+12 |
age_group_5_25_under | 6.392e+11 | 6.02e+11 | 1.061 | 0.290 | -5.48e+11 | 1.83e+12 |
age_group_5_25to29 | 6.392e+11 | 6.02e+11 | 1.061 | 0.290 | -5.48e+11 | 1.83e+12 |
age_group_5_30to34 | 6.392e+11 | 6.02e+11 | 1.061 | 0.290 | -5.48e+11 | 1.83e+12 |
age_group_5_35to39 | 6.392e+11 | 6.02e+11 | 1.061 | 0.290 | -5.48e+11 | 1.83e+12 |
age_group_5_40to44 | 6.392e+11 | 6.02e+11 | 1.061 | 0.290 | -5.48e+11 | 1.83e+12 |
age_group_5_45to49 | 6.392e+11 | 6.02e+11 | 1.061 | 0.290 | -5.48e+11 | 1.83e+12 |
age_group_5_50to54 | 6.392e+11 | 6.02e+11 | 1.061 | 0.290 | -5.48e+11 | 1.83e+12 |
age_group_5_55to59 | 6.392e+11 | 6.02e+11 | 1.061 | 0.290 | -5.48e+11 | 1.83e+12 |
age_group_5_60to64 | 6.392e+11 | 6.02e+11 | 1.061 | 0.290 | -5.48e+11 | 1.83e+12 |
age_group_5_65_over | 6.392e+11 | 6.02e+11 | 1.061 | 0.290 | -5.48e+11 | 1.83e+12 |
Omnibus: | 9.682 | Durbin-Watson: | 1.513 |
---|---|---|---|
Prob(Omnibus): | 0.008 | Jarque-Bera (JB): | 10.171 |
Skew: | 0.504 | Prob(JB): | 0.00619 |
Kurtosis: | 2.942 | Cond. No. | 3.41e+15 |
model75 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result75 = model75.fit()
result75.summary()
Dep. Variable: | performance_rating | R-squared: | 0.098 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.059 |
Method: | Least Squares | F-statistic: | 2.481 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.00770 |
Time: | 20:32:03 | Log-Likelihood: | -2.1689 |
No. Observations: | 239 | AIC: | 26.34 |
Df Residuals: | 228 | BIC: | 64.58 |
Df Model: | 10 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 6.591e+12 | 4.41e+12 | 1.495 | 0.136 | -2.09e+12 | 1.53e+13 |
race_grouping_white | -7.266e+12 | 4.86e+12 | -1.495 | 0.136 | -1.68e+13 | 2.31e+12 |
race_grouping_person_of_color | -7.266e+12 | 4.86e+12 | -1.495 | 0.136 | -1.68e+13 | 2.31e+12 |
age_group_5_25_under | 6.747e+11 | 4.51e+11 | 1.495 | 0.136 | -2.14e+11 | 1.56e+12 |
age_group_5_25to29 | 6.747e+11 | 4.51e+11 | 1.495 | 0.136 | -2.14e+11 | 1.56e+12 |
age_group_5_30to34 | 6.747e+11 | 4.51e+11 | 1.495 | 0.136 | -2.14e+11 | 1.56e+12 |
age_group_5_35to39 | 6.747e+11 | 4.51e+11 | 1.495 | 0.136 | -2.14e+11 | 1.56e+12 |
age_group_5_40to44 | 6.747e+11 | 4.51e+11 | 1.495 | 0.136 | -2.14e+11 | 1.56e+12 |
age_group_5_45to49 | 6.747e+11 | 4.51e+11 | 1.495 | 0.136 | -2.14e+11 | 1.56e+12 |
age_group_5_50to54 | 6.747e+11 | 4.51e+11 | 1.495 | 0.136 | -2.14e+11 | 1.56e+12 |
age_group_5_55to59 | 6.747e+11 | 4.51e+11 | 1.495 | 0.136 | -2.14e+11 | 1.56e+12 |
age_group_5_60to64 | 6.747e+11 | 4.51e+11 | 1.495 | 0.136 | -2.14e+11 | 1.56e+12 |
age_group_5_65_over | 6.747e+11 | 4.51e+11 | 1.495 | 0.136 | -2.14e+11 | 1.56e+12 |
Omnibus: | 8.581 | Durbin-Watson: | 1.662 |
---|---|---|---|
Prob(Omnibus): | 0.014 | Jarque-Bera (JB): | 8.884 |
Skew: | 0.472 | Prob(JB): | 0.0118 |
Kurtosis: | 2.962 | Cond. No. | 3.39e+15 |
model76 = sm.ols(data=merit_raises_combined_hourly_regression, formula = 'performance_rating ~ gender_Female + gender_Male + race_grouping_white + race_grouping_person_of_color + age_group_5_25_under + age_group_5_25to29 + age_group_5_30to34 + age_group_5_35to39 + age_group_5_40to44 + age_group_5_45to49 + age_group_5_50to54 + age_group_5_55to59 + age_group_5_60to64 + age_group_5_65_over')
result76 = model76.fit()
result76.summary()
Dep. Variable: | performance_rating | R-squared: | 0.119 |
---|---|---|---|
Model: | OLS | Adj. R-squared: | 0.076 |
Method: | Least Squares | F-statistic: | 2.776 |
Date: | Tue, 12 Apr 2022 | Prob (F-statistic): | 0.00210 |
Time: | 20:32:03 | Log-Likelihood: | 0.56895 |
No. Observations: | 239 | AIC: | 22.86 |
Df Residuals: | 227 | BIC: | 64.58 |
Df Model: | 11 | ||
Covariance Type: | nonrobust |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 5.021e+12 | 4.47e+12 | 1.124 | 0.262 | -3.78e+12 | 1.38e+13 |
gender_Female | -2.754e+12 | 2.45e+12 | -1.124 | 0.262 | -7.58e+12 | 2.07e+12 |
gender_Male | -2.754e+12 | 2.45e+12 | -1.124 | 0.262 | -7.58e+12 | 2.07e+12 |
race_grouping_white | -2.841e+12 | 2.53e+12 | -1.124 | 0.262 | -7.82e+12 | 2.14e+12 |
race_grouping_person_of_color | -2.841e+12 | 2.53e+12 | -1.124 | 0.262 | -7.82e+12 | 2.14e+12 |
age_group_5_25_under | 5.731e+11 | 5.1e+11 | 1.124 | 0.262 | -4.32e+11 | 1.58e+12 |
age_group_5_25to29 | 5.731e+11 | 5.1e+11 | 1.124 | 0.262 | -4.32e+11 | 1.58e+12 |
age_group_5_30to34 | 5.731e+11 | 5.1e+11 | 1.124 | 0.262 | -4.32e+11 | 1.58e+12 |
age_group_5_35to39 | 5.731e+11 | 5.1e+11 | 1.124 | 0.262 | -4.32e+11 | 1.58e+12 |
age_group_5_40to44 | 5.731e+11 | 5.1e+11 | 1.124 | 0.262 | -4.32e+11 | 1.58e+12 |
age_group_5_45to49 | 5.731e+11 | 5.1e+11 | 1.124 | 0.262 | -4.32e+11 | 1.58e+12 |
age_group_5_50to54 | 5.731e+11 | 5.1e+11 | 1.124 | 0.262 | -4.32e+11 | 1.58e+12 |
age_group_5_55to59 | 5.731e+11 | 5.1e+11 | 1.124 | 0.262 | -4.32e+11 | 1.58e+12 |
age_group_5_60to64 | 5.731e+11 | 5.1e+11 | 1.124 | 0.262 | -4.32e+11 | 1.58e+12 |
age_group_5_65_over | 5.731e+11 | 5.1e+11 | 1.124 | 0.262 | -4.32e+11 | 1.58e+12 |
Omnibus: | 12.228 | Durbin-Watson: | 1.654 |
---|---|---|---|
Prob(Omnibus): | 0.002 | Jarque-Bera (JB): | 12.731 |
Skew: | 0.556 | Prob(JB): | 0.00172 |
Kurtosis: | 3.202 | Cond. No. | 6.62e+15 |