Log of what of Bryan Canela has learned at Techie Youth

Sat. Aug. 13, 2022

Submitting my proposal and learning about virtual assistants

Today at Techie Youth I posted my proposal for the AI assignment and started to work on the virtual assistant unit.

I have really enjoyed the process of doing this AI assignment and really want to see what people will say about the proposal that I have come up with.

After posting my plans I learned some tips for being good at phone calls as a virtual assistant. I need to have good energy and a happy tone, caller should feel like most important person. I have to speak clearly and to the phone and avoid being too casual. I must know basic answers to common questions that customers ask about. It is important that I don’t ask to put others on hold, I can say something like “thanks for calling, I’ll be right with you” and then place someone on hold. The final point on calls that I learned was not just hang up the phone and actually say goodbye.

I learned that responsibilities of a virtual assistant includes responding to emails and phone calls, scheduling meetings, and booking travel accommodations, managing a contact list, preparing customer spreadsheets and keeping online records, organizing managers’ calendars, performing market researching, creating presentations as assigned, addressing employees administrative queriers and providing customer service. Some of the requirements for being a virtual assistant are relevant experience, familiarity with current technology, excellent phone, email, and messing skills, excellent time management skills and having good organizational skills. Some of the traits that make a good virtual assistant are patience, honesty, discretional, being personable, professionalism, and being organized.

If I want to be a virtual assistant with no experience, I learned that it was recommended that I start small. This is because it’s much easier to find potential clients, land jobs, and establish myself as an expert when I'm focused on a particular group. Some common virtual assistant positions that I saw were social media assistant, blogger’s virtual assistant, ecommerce assistant, real estate assistant, and general admin assistant. There are various websites I can use to find virtual assistant jobs such as Boldy, Fancy Hands, and Upwork. I also learned about some of the various skills that I need to have as a virtual assistant and how to find good deals on travel accommodations.

Thu. Aug. 11, 2022

Finshing my proposal assignment

Today I was finishing up my proposal assignment for the AI unit. The only thing I have left is to find a suitable place to post it in order to receive feedback. Over the course of doing this proposal I have learned many valuable Python skills. I have also discovered that I would definitely like to become a data scientist for my career and greatly appreciate the resources that Techie Youth has shown me over the course of finishing this assignment. The only struggles that I had today was figuring out the best way to present my data to show everything that I have learned. I can't wait to see the feedback to see what more I can learn about data science and to see how my skills continue to improve from here.

Wed. Aug. 10, 2022

Last touches on proposal

Today I continued to work on the last steps of my proposal. I have been taking the data that I have analyzed and explored and used to to make conclusions on the solutions that I have been working on, which is improving the outcomes of aged out foster care children by partnering with companies to help support youth during their education. I theorized that this solution would help to improve the rates of college graduation and high school completion rates of foster children. Along with this, I was supported the decision of increasing the age youth are allowed in foster care up to 21. I found various source backing the effectiveness of increasing the age of foster care to 21. I have been compiling and refining all the information on this and will be done soon with my proposal. I look forward to the feedback that I will receive on it so that I can improve as an aspiring data scientist.

Tue. Aug. 9, 2022

Finishing up my proposal

Today I was working on finishing the last touches when it comes to the data from my proposal. I have performed data exploration almost all of the datasets that I have been using. On top of that I have found all the studies that I will be using to back up my proposal. The only thing I do not have much of is correlation between various datasets because of not being able to calculate the correlation coefficient between arrays of different lengths. Besides not being able to get correlations for different datasets to each other, I was able to see interesting information such as the population most plagued by the issues that I am was looking into. After having almost all of my data analyzed, what I need to do now is write up my proposal and come to conclusions on it based on the information that I have now. After this I will be looking for popular forums in order to submit this and get feedback on it.

Mon. Aug. 8, 2022

Proposal assignment and final hurdles

Today at Techie Youth I started by continuing my work on my proposal. The first thing that I started to work on was continuing what I did on Friday and look further into the efficacy of methods current implemented to help curb youth homelessness and improve the outcomes of aged out foster care youth. A big solution that I did research into for these issues is extending the age of foster care to age 21, instead of 18 as it is now in most states. Studies that I have looked at have concluded the costs of keeping youth in foster care until age 21 versus the benefits gotten out of it. In one of the studies I looked at, the state is reimbursed by the federal government for part of the costs of keeping youth in foster care until age 21. One of the benefits of that were found as part of extending foster care to 21 is that in the state of Illinois, former foster youth, where young people can and often do remain in care until their 21st birthday, were twice as likely to have ever attended college and more than twice as likely to have completed at least one year of college by age 21 compared with former foster youth from Iowa and Wisconsin, where remaining in care beyond 18 is not an option. It was found that roughly 10.2% of youth will graduate college if they cannot remain in care until 21, like in most states, while being allowed to stay in foster care until age 21 will double the percentage of foster care youth who graduate from college. This study has also found that there is an almost $2 benefit to cost ratio in increased earnings due to higher rates of bachelor’s degree completion for every $1 spent on foster care beyond age 18. Using this study I will look for data look at data for NY seeing college completion rates for foster care youth before and after the age for aging out of foster care was raised to 21 and analyze it's effectiveness, along with applying my solution before of having better career centers provided to foster care youth on top of the higher age limit if the higher age limit is shown to be effective.

After looking at more data I've decided to adapt my solution a bit for improving the outcomes of aged out foster care youth. On top of providing more funding to career centers, I'm proposing a solution to partner with companies help pay for youth education. For students interested in higher education, the child services administration would partner with companies where the government would pay for most of a youth's higher education with a company paying for part, along with the company providing support/mentoring for the student while they are completing their education. The government would pay for a certain percentage of the costs that the company must reimburse if the youth is not employed for a specified period of time after graduation for good reason. With this latest revision to my proposal, I think I am very soon going to be ready to submit my plan to a forum to get feedback on it and completely my assignment.

Fri. Aug. 5, 2022

Python learning and my proposal

Today I started by continuing prior practice that I've been doing with Python dictionaries after having worked with a few other datasets and seeing how much easier things would be with them. I learned that the number of times a unique value occurs in a dictionary is called frequency. When you make a table of these frequencies it is called a frequency table. When I'm analyzing frequencies, I learned that it may be very useful to find the proportions and percentages in the frequency table. I can use the in operator to determine whether or not a key is in the dictionary. Another thing I learned about dictionaries is that a for loop over a dictionary will loop over its keys, which definitely cleared up some problems that I was having while working on my proposal. I can use the dict.keys() and dict.values() commands to access a collection of all the keys or all the values in a dictionary. The dict.items() method lets me iterate over the keys and values of a dictionary simultaneously. After learning more practical information about dictionaries in python I did some of kaggle's dictionaries exercises which helped me a lot.

Going over dictionaries has helped me greatly with my proposal because the datasets that I have presented all their information with large tables. Using dictionary has helped me to determine more percentages in my data and be able to draw better conclusions from it.

When it comes to my proposal, I started researching to analyze the efficiencies of the current methods that are currently implemented to help curb youth homelessness and improve the outcomes of aged out foster care youth. One way in which aged out foster care youth are being helped is federal law requiring that child welfare agencies help youth develop personalized transition plans that address topics such as education, housing, employment, finances, and physical and mental health. A transition plan should start early and be broken into short-term, age-appropriate, and manageable goals that align with a youth's individual needs, hopes, and strengths.

Federal law requires that child welfare agencies help youth develop personalized transition plans that address topics such as education, housing, employment, finances, and physical and mental health. A transition plan should start early and be broken into short-term, age-appropriate, and manageable goals that align with a youth's individual needs, hopes, and strengths. States also offer an array of services and resources designed to assist youth in foster care as they work toward self-sufficiency. Educational and Training Vouchers (ETVs) are grants, funded by the federal government and administered by the states, awarded to eligible current and former foster youth. ETV coordinators are responsible for helping youth in foster care obtain funding for postsecondary education. There are also Independent Living programs which are federally funded programs designed to help eligible youth make successful transitions from foster care to independent living. Independent living coordinators in each state are responsible for assisting youth in foster care access services geared toward achieving self-sufficiency prior to exiting foster care.

This information covers what is being done for aged out foster care youth and partly covers helping to prevent youth homelessness through funds. After researching more on what current solutions are, I am going to be seeing how to measure the efficacy of the current solutions are.

Thu. Aug. 4, 2022

Reviewing prior topics and proposal continuation

Today with Techie Youth I started by going over prior units in order to complete the quizzes which I do not do well enough in. The units that I had to go over were finances and how to work remotely. When it comes to finances I know it is important to determine what your money is being spent on and what you can save money on each month. The most basic thing is cutting down on luxuries. In order to cut down on luxuries I have to determine what necessities are which are things like food, transportation, and utilities. Following finances I also reviewed how to better work remotely in order to take the quiz. What I reviewed about remote working was the etiquette of zoom calls, and the importance of staying in touch with your colleagues while working remotely. Another important thing is to have places to talk to colleagues that are not work related.

After reviewing units for quizzes I continued to work on my proposal for the AI unit assignment. The dataset that I performed data exploration for today is Individual Census by Borough, Community District, and Facility Type. This dataset presents the number of individuals for each shelter facility type by borough and community district. What this will do is help me to see the demographics of each shelter to also confirm areas where people of those demographics reside to see if I need to focus parts of the solutions for my proposal in those specific areas. I have made some preliminary findings based on this dataset that I have analyzed and on the prior ones that I have done. A very important part that I have to do is relating all the data to each other and beginning to draw some preliminary conclusions from all this. I also have been continuing to look up and read studies that I can include in part of my proposal as to the causes of youth homelessness and aged out foster care youth prospects. The reasoning behind me looking up studies for this is because finding the cause behind big issues like this is something that is better taken care of by researchers who've thoroughly investigated these problems. I would be doing a disservice to myself and those I would be presenting to if I tried to act like I could make reasonable conclusions as to the causes of these societal issues on my own. There is no way I can come to statistically significant causes through datasets I find online versus a team of researchers working on this who have already published studies on the issues.

Wed. Aug. 3, 2022

AI assignment proposal and Python practice

Today I continued to work on my proposal for the AI assignment. The dataset that I started to work on was Children Served in Preventive Services by Borough and CD. This dataset contains counts of children served in child welfare preventive services by by borough and community district of family. I performed some data exploration on this dataset in order to get an idea of the amount of children that there are who are possibly at risk of homelessness or would be served well by having a program such as Techie Youth helping them. I was able to come to this conclusion due to studies that I have been looking at while working on my proposal showing that many homeless and runaway youth come from abusive households. By taking a look at this information I would be able to better help youth who experience many hardship after coming from abusive households. Another dataset that I performed data exploration on is Demographics by Community District. This dataset provides a demographic breakdown of only DYCD-funded participants within Community District of NYC. The data displays the counts, and percentages of the participants' gender, ethnicity, and race. I am going to be using this data to see which groups are most served by DYCD and compare it to the demographics of each district in order to determine the areas which should be kept an eye on in order to reduce instances of youth homelessness and improve on youth outcomes.

After working on my proposal I was completing more exercises in python, specifically when it comes to dictionaries and applying them. A dictionary in python is essentially also a list but with the index numbers of entries in the list being replaced with anything that isn't another dictionary or a list. The index number of a dictionary is called a key, with its element being a value and it is called a key-value pair. This will be useful for my proposal because there are many datasets where it would be very useful to store information as key-value pairs such as for demographics. This will help make it easier to not only reference the data, but easier to present and understand from someone else's perspective.

Tue. Aug. 2, 2022

Proposal continuation and Python exercise

Today I started by learning more Python through practice exercises for the topics I learned earlier in order to better apply them to my proposal. The first thing I started with was practicing correlation exercises through datacamp. Finding correlations here uses the Numpy, Pandas and the scipy learning kit module. I practiced first using DataFrames in Numpy with information and finding the correlation between two DataFrames. The pandas method I used was .corr(). The .corr() includes the parameter method which can be used to caculate the three correlation coefficients which is called the Pearson's coefficient. In order to find the correlation coefficient for all variables I can use this method for the DataFrame itself. This provided me a matrix though and I was shown how to use a heat map to represent all of these correlation coefficients and make them easier to interpret. A very important aspect that I learning about regarding correlation is that correlation does not imply causation. The correlation only quantifies the strength and the direction of the relationship between two variables. There might be a strong correlation between two variables, but it does not allow us to conclude that one causes the other. I also learned that when strong correlation are not causal, they are called spurious correlations.

After the correlation exercises I started to work on DataQuest exercises in order to get better at using Python dictionaries and longer conditional statements in functions. I completed various exercises nesting if statements while using them in conjunction with various modules to form more complex code. This has definitely helped me with some of the issues I've been having with my proposal.

For my proposal I started to perform data exploration on the DHS dashboard. This dataset displays demographics for the families and individuals residing in the Department of Homeless Services (DHS) shelter system. I was able to come to some preliminary conclusions for this dataset but need to perform some more in depth data exploration for the datasets that I am using along with comparing them more to one another after some further practice. I think my proposal is coming along very well and am looking forward to seeing the further conclusions I am able to draw from all this information.

Mon. Aug. 1, 2022

Contuining on AI assignment proposal

Today I continued to work on my proposal for improving the outcomes of youths aged out of foster care and reducing instances of youth homelessness. I am rethinking the aspect of decreasing youth incarceration rates from my plan while I am in the process of analyzing various datasets. Depending on the conclusions that I arrive to from the other datasets for my proposal, I will revise my plan during its finalization. The next dataset that I have been analyzing is Detention Admissions by Community District. This dataset captures counts of youth entering detention in a calendar year by their home borough and community district. I have sufficient data exploration on this and decided to compare this dataset to another one. The dataset I am comparing it to is Associated Address by Borough and Community District. This dataset presents the number of cases and individuals for each shelter case type by borough and community district. After having performed data exploration on this dataset, this issue I ran into was not knowing exactly how to compare these two datasets in a quantifiable manner. This is where I started to learn about how to find the correlation between datasets in python. Correlation is the relationship between two variables and there can be three different types of correlation. There is positive correlation where both variables change in the same direction. Neutral correlation is when there is no relationship in the change of the variables. Negative correlation is when variables change in opposite directions. I learned that the performance of some algorithms can deteriorate if two or more variables are tightly related, called multicollinearity. An important aspect of finding correlation between two datasets is something called covariance. Covariance is the relationship between two variables and it is calculated as the average of the product between the values from each samples, where the values have been centered or normally distributed. The sign of the covariance can be interpreted as whether the two variables change in the same direction, positive, or change in different directions, negative. The magnitude of the covariance is not easily interpreted. A covariance value of zero indicates that both variables are completely independent.

The way correlation is calculated in Python is through finding the Pearson correlation coefficient which is used to summarize the strength of the linear relationship between two data samples. The Pearson’s correlation coefficient is calculated as the covariance of the two variables divided by the product of the standard deviation of each data sample. It is the normalization of the covariance between the two variables to give an interpretable score. The pearsonr() SciPy function can be used to calculate the Pearson’s correlation coefficient between two data samples with the same length. Another correlation method used is Spearman's correlation for variables that have a nonlinear relationship or do not have a Gaussian distribution (normal distribution). When this is the case for data samples Spearman's correlation can be used but there may be lower coefficient scores, which lie between -1 and 1. The function used for this is spearmanr() also part of the SciPy package and it requires data samples with the same length.

After learning all this and following some tutorials, I will be practicing this through exercises later in order to make sure that I can use these functions properly.

Fri. Jul. 29, 2022

Working on my proposal and revisiting skills necessary for it

Today I went further into what I will be working on for my proposal and did more of the necessary research and information gathering required. My proposal is to focus on youth incarceration and youth homeless, which is in part affected by the outcomes of aged-out foster youth, and provide solutions that might help with these problems. Some of the solutions that I have come up with on a preliminary basis are expanding DYCD programs to other areas, determining what those areas should be, and providing career centers for aged-out foster care youth in order to curb youth homelessness. I will be analyzing NY State and NYC datasets for information showing the statistics of populations plagued with these issues, along with trying to find and analyze information showcasing the effectiveness of solutions that have already been implemented. I started my proposal by firstly looking at how Techie Youth is helping to address these issues. I saw that Techie Youth's solution to the problems I have mentioned above is leveraging education in IT & computer technologies to redirect the roadmap of foster kids and youth-at-risk towards a path of life success. Along with this Techie Youth's training is providing each participant professional skills that will help to ensure they are always capable of obtaining lucrative employment. I believe that Techie Youth's solution helps to curb youth homelessness by providing them with the necessary skills to support themselves through gaining meaningful employment. I will further analyze datasets to look for the biggest of youth homelessness and analyzing the outcomes of career centers themselves to determine their efficacy as a solution to homelessness and improving the outcomes of aged-out foster care youths. If I am able to find statistical significance in career centers helping to prevent homelessness, I would like to append to my part of my proposal including Techie Youth as a resource provided to youths nearing the age where they would age out of foster care.

As part of the process of being a data scientist, I started to form hypotheses on causes for youth homelessness and incarceration in order to help me decide on aspects that I should focus on. Some hypotheses I came up with are these problems being caused by lack in applicable skills to the workforce, lack of support after aging out of foster care, and higher poverty rates leading to higher levels of youth incarceration. Another skill that I have been learning today is working on my scientific literacy in order to improve my ability to analyze studies on the topic that I have chosen. This will help me to determine the validity and significance of these studies myself without needing to rely on any articles or outside sources that might come to different conclusions from studies performed. The first dataset I started to work on is Adolescents in Foster Care (permanency outcomes). This dataset will let me to see current youth outcomes after foster care to help me determine if my current solutions have some validity to them or if I need to go back to the drawing board. After performing some data exploration on this dataset I realized that I needed to work on my data cleaning skills and went to Kaggle's tutorial in order to perform some exercises to get better acquainted with it. I had trouble with inconsistent data entry and how to handle missing values so I have been delving further into the data cleaning by completing various practice exercises and some extra readings since practice is the best way to learn coding.

I am looking forward to how my proposal comes along and know that there will be various struggles along the way. As I keep working on it, I will be sure to use the various resources I have been provided with by Techie Youth to figure out answers to any problems that I might have.

Thu. Jul. 28, 2022

Contuining Titanic challenge and find necessary data for my plan

Today I started by going through all of the databases that I was provided by Techie Youth in order to find the data I would need in order to form my proposal. The proposal that I have come up with for now is further expanding DYCD after school programs and any other programs provided into zip codes containing demographics that are primarily affected by youth homelessness and youth incarceration. On top of this I wanted to propose adding career centers to NYC districts with higher poverty levels than those around them. Other data I will be looking at foster care placements in zip codes to also help determine the areas to focus more programs on. Some datasets that I found are as follows: Adolescents in foster care (permanency outcomes), detention and placement demographics, detention admissions by community district, individual census, homeless population by year, children served in preventive serves by borough, demographics by community district, DHS data dashboard, and associated address by borough and community district. Permanency outcomes of adolescents shows statistics on youth in foster care. Detention and placement demographics is a table showing annual admissions to detention by age, race, gender, admitting authority, and top arrest charge level and zip code. I will be analyzing this dataset to show the types of charges youths are being brought in on and the districts they are from to help focus on where parts of my proposal should focus. Detention admissions focuses on youth being brought into detention centers by borough and community district. I will be analyzing this data to show locations youth are incarcerated the most. The other datasets that I included will be analyzed depending on the studies I find and other information that I see on the causes of the issues that I will be focusing on in my proposal.

After fitting the data to the random forest tree the prediction data for each passenger is saved to a file called submission.csv. The submission is graded based on the accuracy of each prediction. I tested my prediction against the test data. Seeing the prediction results I was able to see that just a random forest tree model is not very sufficient for getting accurate predictions for this data. As I read before, I think I would need an ensemble to a model in order to get better predictions. I will go into more tutorials and stay on Kaggle further to improve my data science and model making skills.

Wed. Jul. 27, 2022

Doing Titanic Kaggle competition and looking/planning AI assignment

Today I started by starting to work on the titanic kaggle competition. The first thing I did was look at all my data and understand what exactly the competition is asking for. The problem I am trying to answer is determining what types of people were more likely to survive the titanic sinking based off of passenger data. I am given three sets of data being train.csv, test.csv and gender_submission.csv. Train.csv is the training data that I am being given. With the patterns that I find from the training data I use the test.csv data in order to predict whether or not a passenger survived. Gender_submission.csv is a set of predictions that assumes all and only female passengers survived the titanic and shows me what a submission file should look like. After taking a precursory look into this data I loaded the data into Kaggle's notebook and printed out the first few rows of data to start looking at it using the head method. I was looking at the train data and test data in order to form some preliminary thoughts. After I looked at my data I started to make some hypothesis. Some of my hypothesis of factors that influenced whether or not someone survived was if they were male or female, the age of the passenger, the location of the ticket, whether a passenger boarded with any family members, the class of the ticket, and the fare of tickets. After coming up with my hypotheses I performed some data exploration. I started by determining the percentage of men who survived vs women who survived. I did this by summing up the men column and dividing it by it's sum, while doing the same for the women column. The information that I got was that only 18.89% of men survived vs 74.2% of women surviving. This let me know that gender was a very big indicator of whether or not someone survived. I performed more data exploration and determined the percentage of people that survived for each ticket class as well.

The model that I used was the random forest tree that I talked about before. Random forest tree works by having multiple decision trees that will take in passenger data and determines whether the person survived. The outcome with the most votes ends up wining. Some of the features I decided on here are sex, ticket class, whether the individual had any siblings and whether the individual boarded with any parents or had children. From here I saved the features to X and saved y as my training target, being whether or not the passenger survived. Afterwards I fit this data to my decision tree model.

Once I had set aside the Titanic Kaggle competition I started to look at some of the topics I had available and am leaning towards focusing on preventing incarceration of minors or reducing instances of homeless youth. I've started to look at data on what causes homelessness in youths and what leads to them getting incarcerated. Some data I will be taking a look at are zip code demographics and which demographics are most plagued with youth incarceration and homelessness. After analyzing all this data and more, I will come up with my proposal.

Tue. Jul. 26, 2022

Python challenge analysis, coding project and data visualization

Today through Techie Youth I learned about the sample housing price prediction challenge which goes into detail about what to do for data analysis and challenges in general. Going over this challenge will help me to understand what to do in future challenges and how to work on the titanic one I will be completing. Like many of the resources I've seen, this machine learning project stressed that the best way to highlight my data science knowledge is through the completion of projects. I learned that there is a strict process to follow when it comes to being a good data science professional. The steps for this process are understanding the problem, hypothesis generation, getting the data, data exploration, data preprocessing, feature engineering, model training, model evaluation, and model testing. It is very important to be thorough when going through this process and not skipping any steps. Like I've seen earlier I need to know what the question I am being asked is. For this project it is going over Kaggle's housing data set and is aimed at predicting the prices of houses. The next step is generating hypothesis. What this entails is thinking of factors that can influence house pricing. There are two parts to defining a hypothesis, null hypothesis (Ho) and alternate hypothesis (Ha). Ho means that there exists no impact of a particular feature on the dependent variable, and Ha means there exists a direct impact of a particular feature on a dependent variable. Based on a decision criteria, such as 5% significance) we always 'reject' or 'fail to reject' the null hypothesis in statistics. 5% significance essentially means something would occur 95% of the time. When I'm building a model I look for probability, or p, values. When p is less than .05 I reject the null hypothesis and if p is greater than .05 I fail to reject it. It is important to think of factors that can affect the problem I am looking at and continuously write them down. Getting the data is just downloading my data set and loading it into whatever Python IDE I am using.

The housing data set specifically has 81 explanatory variables which is going to be a lot of data exploration. I learned that there is a saying that good data exploration strategy can solve even complicated problems in a few hours. There are various strategies that comprise good data exploration: univariate analysis, bivariate analysis, multivariate analysis, and cross tables. Univariate analysis is used to visualize one variable in one plot for ex. histograms, density plots. Bivariate analysis is used to visualize two variables in on plot for ex. bar chart and line chart. Multivariate analysis is used to visualize more than two variables at once, examples are bar charts and dodged bar charts. Cross tables are used to compare the behavior of two categorical variables. As part of the exploration of the data I saw that the percentage of missing values for each explanatory variable was calculated. When the sale price data was visualized I saw that it was right skewed. The data needed to be log transformed to be normally distributed. The purpose of normally distribution a target variable is that it helps in better modeling the relationship between target and independent variables. Not every single variable was approached visually, instead some variables were looked at based on their correlation with the target variable. Going through this tutorial showed me how in depth data exploration in. Just following through and practicing the data exploration for the housing data set gave me various insights into data set analysis and puts into perspective how complex this can really be.

After following along with the tutorial and notes for the housing data set I looked at how to visualize data through the use of heatmaps and density plots. The seaborn package was used in order to use different data visualization tools. Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics. Heatmaps are called with the sns.heatmap function with the arguments being the data you want to use for it and annot that ensures that the values for each cell appear on the chart. Density plots, or more specifically kernel density estimates (KDE), are smoothed out histograms. To make a KDE plot I use the sns.kdeplot command. The arguments of this function are the data we are using and shade which determines whether the plot area is shaded. I can create 2D KDE plots with sns.joinplot command.

Finally, for my coding project, since I was struggling with it I decided to do more practice problems for Python regarding for loops and conditions. I found that I have been struggled with nested for loops and how/when to apply them so I have started to do more practice problems on DataQuest and other resources online that I have found.

Mon. Jul. 25, 2022

Python coding project and learning more about AI

Today I started with reading more articles provided by Techie Youth about AI. The first thing I learned was the steps involved in starting a robotics company. Step one is planning out the business. Part of planning out the business includes calculating the costs involved in opening a robotics computer. Some things required are an office, a production center, computers, printers, high-speed internet and raw materials necessary for the robots themselves. Included in the startups costs are hiring engineers if I am incapable of making robotics myself with a team still being necessary later on as my business expands. Ongoing expenses for a robotics company include computers, modeling software, printers, insurance, utilities, parts and materials, personnel, office space, and a production center. I also need to include a few thousand in travel expenses and a minimum of $5k a year for computers software and printers. After this personnel salaries will take up a large budget. The following planning steps are figuring out my target market, calculating how much to charge customers, calculating profit and lastly naming my business. The next step for making a robotics company is making a legal entity. The most common business structure types are sole proprietorship, partnership, limited liability company (LLC), and corporation. Establishing a business as an LLC or corporation protects me from being held personally liable if my company is sued. After this I need to register for taxes, and then open and setting up a business bank account. I would then need to obtain necessary licenses and permits along with getting business insurance. Once I've set up all the legal and logistical framework I need define my brand and move on to marketing. Marketing includes things like social media outreach, marketing campaigns and starting website. Finally, robotics companies have a lot of potential for growth and all it takes is developing one in demand product

I continued the coding project that I've been doing through DataQuest. The objective of this project is analyzing data on helicopter prison escapes and making analysis and observations based on this data. The dataset comes from this wikipedia article - https://en.wikipedia.org/wiki/List_of_helicopter_prison_escapes#Actual_attempts. From this dataset I removed any unnecessary entries including the details sections that is irrelevant for analysis and extracted only the year from the listed dates since I did not really need the exact dates, just the years. I extracted the year using the fetch_year method which is from a helper package that was provided. This helper package made use of various commands from the pandas package in order to streamline and simplify the coding project. Through the project I tracked things such as the number of attempts of prison escapes using helicopters for each year along with how to better organize and select my information from a dataset. I am looking forward to finishing up my project and cannot wait to delve deeper into the skills required to become a data scientist.

When I was looking at the job listings for AI provided, I learned that data science leads very nicely into many different fields such as machine learning and advanced AI development. Many jobs asking for data scientists are involved with robotics and using AI. This makes look forward to my future career prospect, and I love how much flexibility there is in the data science field.

Fri. Jul. 22, 2022

Algorithms, Machine Learning Intro and coding projects - I learned this today at Techie Youth

Today I continued Khan Academy's algorithm unit in preparation for working on my own project to make my code more efficient. The first thing I learned about is how to improve the efficiency of recursive algorithms. I learned that recursion is a good way to solve many problems but that they can be inefficient in terms of time and space. A technique to improve these algorithms is called memoization which saves the computer time when making identical function calls. Memoization remembers the result of a function call with particular inputs in a lookup table and returns that result when the function is called again with the same inputs. Memoization makes a trade-off between time and space. As long as I am efficiently looking things up and I am calling a function repeatedly, the computer can save time as the cost of using memory to store the memo. Another thing I learned was that sometimes the best way to improve recursive algorithm efficiency is not using recursion at all. There is an iterative technique called the bottom-up approach which can save time and space. For this approach, the computer solves the sub problems first and uses the partial results to arrive at the final results. After this I learned that memoization and the bottom-up approach are both techniques from dynamic problem, which is a problem solving technique in math and computer science. Dynamic programming can be used when a problem has optimal substructure and overlapping sub-problems. Optimal substructure means that the optimal solution to the problem can be created from optimal solutions of its sub-problems.

Following recursive techniques and dynamic programing I learned about merge sorting, which is a divide and conquer algorithm. Merge sorting divides input arrays into two halves, calls itself for the two halves and merges the two sorted halves. Divide and conquer is an algorithm that consists of three steps. The first step is divide which involves divided the problem into smaller sub-problems. Next is conquer where sub-problems are solved by calling recursively until solved. The final step is combine where sub problems are combined to get the final solution of the whole problem. For merge sorting the combine part of the divide and conquer algorithm does not do much. Another sorting technique using divide and conquer is quicksorting. Quicksort works by splitting a large array of data into smaller sub-arrays. Each iteration splits the input into two components, sorts them, and then recombines them. With this sorting method all the real work happens in the divide step with combine doing nothing. Quicksort works in place while merge sort does not, what this means is that the algorithm does not use extra space for manipulating the input but may require small extra space for its operation. While merge and quicksort both use divide and conquer, the biggest difference between them is how they apply it. When is comes to real world use, quicksort outperforms merge sort, and significantly outperforms select sort and insertion sort.

After learning about two divide and conquer sorting methods I learned about breadth-first search. Breadth-first search finds the shortest path from a given source vertex to all other vertices, in terms of the number of edges in the paths. How this search methods works specifically is firstly by assigning two values to each vertex, distance and the predecessor. Distance is the minimum number of edges in any path from the source vertex to another vertex while predecessor is the shorted path from the source vertex. When there is no path from the source vertex to a vertex, the distance is infinite and its predecessor has the same special value as the source's predecessor.

I then followed up on the intro to machine learned tutorial on Kaggle where I learned about a new model called a random forest. I learned off some of the downsides of a decision tree being that a deep tree with a lot of leaves will overfit because each prediction is coming from historical data from only the few data points at its leaves, but a shallow tree with few leaves will perform poorly because it fails to capture as many distinctions in the raw data. A good way to get around this is through the use of the random forest model which uses many trees, and it makes a prediction by averages the predictions of each component tree. It generally has better predictive accuracy than a single decision tree and it works well with default parameters. There are other models with better performance than this but many of them require getting the right parameters to provide more accurate predictions. After learning about random forests I then completed it's following exercise where I had to build a random forest model. For this model though I only had to replace the decision tree model that I build earlier with a random forest. It was surprising to me how much lower the error was for random forests vs decisions tree, with random forests blowing decisions trees out of the park. I also saw that random forest models are very beginner friendly because one of their best features is working reasonably well even without tuning.

The final thing I learned today was more about Python and reinforcing some of what I've learned through a self guided project I have been doing. For this project I needed to answer the questions of what year the most helicopter prison break attempts occurred and which countries have the most helicopter prison breaks. For this project I learned how to use Jupyter notebook which is an online code editor. Jupyter notebook combines coding, narrative text, and visualizations which are key elements for data science. I learned that due to Jupyter making it easy to combine those three elements, it is important to know how to use it as a data scientist and is a reason that many high profile companies use it. A big advantage of Jupyter notebook is that it allows me to easily share projects with other people.

Thu. Jul. 21, 2022

Reinforcing financial literacy knowledge and expanding my Python skills

Today I went further into Kaggle's machine learning introduction tutorial. To build up on the what I went over yesterday I started learning how to code for validating any machine learning models I make. Part of the scikit-learn library is the train_test_split function. What this function does is break data up into two pieces. I then use some of that data as training data to fit the model, and use the other data as validation data to calculate the mean absolute error. Following this I followed exercises where I split data and then used the data I previously stored on the X DataFrame with my target being y. After this I used a DecisionTreeRegression with the random state being 1. I then made prediction using this decision tree command to make predictions using the data I set up earlier. An important part of making predictions with data is inspecting said predictions. One thing I noticed is the predictions I got from the training data being different from the validation data. This is because the model picks up on patterns that are present in training data but might not be in the rest of the data set, which skews predictions. To finish up this exercise on validating data I calculated the mean absolute error of my model. I learned that there is no general rule for what a good MAE is across applications but that there are various ways to use and improve on this number.

Following model validation I learned about undercutting and overfitting. Overfitting is a phenomenon where a model matches the training data almost perfectly, but does poorly in validation and other new data. This can happen when a decision tree has many has many branches. As a decision tree gets deeper, the dataset gets sliced up into leaves with less data. Leaves with little data in it will make predictions that are close to the training data values, but make unreliable predictions for new data because it is making predictions off of insufficient data. Underfitting is when a model fails to capture important distinctions and patterns in the data and performs poorly even in the training data. This can results from a decisions tree being too shallow. I saw that it is important to strick a balance between underfitting and overfitting since it is important for a model's accuracy on new data. A way to control the depth of a decision tree is through the max_leaf_nodes argument which provides a sensible way to control overfitting vs underfitting. The more leaves that a model is allowed the make, the less underfitting there is, while allowing more leaves leads to more overfitting in my model. I learned that a function can be used to compare the mean absolute value for various leaf numbers. What you do is make a for loop that compares the accuracy of models built with different values for max_leaf_nodes. It was summarized that overfitting captures spurious patterns that won't recur in the future, leading to less accurate predictions while underfitting fails to capture relevant patterns, also leading to less accurate predictions. After learning about these two topics I again started to do exercises relating to them. I created a function called get_mae that fit a model and gave me back it's MAE and following that made a list comparing different values for max_leaf_nodes for the model while making a for loop that would give me back the best number for leaf nodes from the list that I inputted. Doing these exercises I learned that when making a decision tree, after finding the best size for said tree I can just input other data without needing to adjust the size anymore.

The last thing I did today was continue to learn more about Python through working on a prison break project where I take data about prison breaks to see how successful they are based on method. Working on my own project has really made me love this more and I cannot wait to do my own self guided projects in the coming future.

Wed. Jul. 20, 2022

More on machine learning and Python - What I learned today

Today I started learning more about Python through Kaggle and another website called DataQuest.

In Kaggle I continued the intro to machine learning tutorial. I learned that there are two ways to select data which are useful for ML being dot notation, which is used to select the "prediction target" and selecting with a column list, which is used to select features. When using dot notation to select features, this data is stores in a Series. The prediction target is the column selected that we want to predict, with the prediction target usually being called y. After this I learned that columns inputted into my model and used to make predictions are called features. There will be times when I use all columns except the target column as features, and others where I would be better of with fewer features. Multiple features can be selected by providing a list of columns names inside brackets with each item in the list being a string. I then saw examples of the describe and head methods being used, which are for showing the top few rows in a data set. An important lesson I learned is that it is important to visually check my data with these commands. The reason being that I will frequently find surprises in the dataset that deserves further inspection. I then learned about a module used to create modules called scikit-learn that is written as sklearn. Scikit-learn is the most popular library for modeling the types of data usually stores in DataFrames. I learned about some of the steps used when building a model which are define, fit, predict, and evaluate. For define I ask myself what kind of model I'm building while specifying some of the parameters. For fit I capture pattern from the provided data and it is the foundation of modeling. Prediction is just making predictions and the evaluating step is where I determine how accurate the model's predictions are. I've seen that many machine learning models allow for some randomness when training the model and that specifying a number for random_state, the amount of randomness, helps to ensure that I get the same results in each run of the model. Doing this is considered good practice when it comes to model training with any number being used not significantly affecting model quality. Following these things I learned about I did some practice where the goal was to make a model that predicted the prices of houses. For the beginning of this exercise I specified my prediction targets which was the sale price and saved it to a variable y. After this is created a variable called X that held predictive features, which was other information about a house besides its selling price such as the year it was built and the number of bedrooms in it. After this I specified the model I was going to use, a decision tree, and fit the data to it, which were the variables X and Y.

After this I learned how use model validation to evaluate the quality of a model. The measure of a model's quality is the accuracy of it's predictions. A mistake that many people make when measuring predictive accuracy is making prediction with training data and comparing those predictions to the target values in the training data. I saw that there are a multitude of metrics of summarizing the quality of a model and the one I focused on today was called Mean Absolute Error (MAE). With MAE I take the absolute value of each error. Error is calculated as the actual value minus the predicted one. The way this is interpreted as is "on average, our predictions are off by about X." As I talked about earlier, the reason using training data for making predictions is wrong is because the model can pick up patterns from a dataset that might not actually be there in real world data. This is called an "in-sample" score. If the pattern that the model picks up from training data is not present when the model sees new data, then the predictions will be very inaccurate when put into use. A better way to measure the performance of a model is using data that wasn't used to build the model. The easiest way to do this is exclude some data from the process of building a model and using that to test the model's accuracy for data it hasn't seen before. Data excluded in order to use to test model performance is called validation data.

Following these topics I learned about and practiced on called I learned more about got practice on using longer if and else statements through DataQuest. I also got started on my first coding project where I obtained real world data from the internet and analyzed this data using Python.

Going through all these tutorials and finally getting started on my first project makes me very exciting to getting further acquainted with this field. I look forward to being able to complete more projects and eventually being able to complete my Techie Youth assignment.

Tue. Jul. 19, 2022

Looking at more Python topics and learning more about Data Science

The first thing I learned about today was Techie Youth's social media platforms, and the various ways they have helped real young people and made a difference in their lives.

After seeing more about Techie Youth I learned about additional ways that Natural Language Processing (NLP) is used in marketing today. The first way is through use in chatbots for capturing leads. A lead is an individual or organization interested in what you are selling, and NLP is good for answering questions before leading to an actual person to talk to. NLP is used for voice searches today, with it being estimated that more than half of all online searches in a year or two will use voice. Sentiment analysis for understanding customers is another way NLP is used in marketing. What this is exactly is NLP extracting the intent and sentiment behind language which is then used to derive the sentiment of conversations with individual customers, which is then used to steer the conversation towards a conversion. In this regard NLP can also be used to look at the sentiment of large groups. NLP allows for automated summarizing for early trend identification. Since NLP can aggregates relevant news, this enables markets to extract important information about a from various online sources. Finally NLP is can be used for AI writing. An NLP can generate optimized keywords or slogans and help with writing stories.

On top of the ways NLP is used in marketing, I was also taught about more ways that it is used in the business world today. One of these ways is through neural machine translation. Due to NLp learning learning similarly to how a person done, early machine translating was fairly bad, but as more words got added to an engine, the quality of these translation improved. The only caveat of the use of NLP for translating is the quality being dependent on the number of words you give it, which takes time, and the reason why many businesses use already established translators by large search engines. NLP is also used as a hiring tool by hiring managers. What NLP does here is scan applicant resumes for keywords to determined if required job background is met, while using extrapolating terms from the results of these keyword searches in order to expand on what is accepted. A final way NLP is used in business is through conversational text. A voice activated system using a NLP would listen in for trigger phrases such are "what are" and search up the proceeding question. This is helpful because it is estimated that the average employee spends 30 percent of their day searching for information and searching in real-time conversations would help to improve productivity.

I've also learned about some of the steps necessary to build a successful NLP project. The first step is to build a strong foundation on the subject I am working on and familiarizing myself with a programming language. Secondly I need to extract data from social media, images and data, and process the data that I have gathered. Processing said data includes cleaning it and structuring it in a meaning way. After this comes tagging word texts and making text mining algorithms. This includes converting text to numbers and computing the frequency of words and group them together to help me understand their hidden meanings. The final step for making an NLP project includes using a model to start making forecasts with the data or making our own model for this.

The last thing I've learned about through Techie Youth is more about making my first machine learning model and also reinforcing the previous knowledge I have gained about Python through doing practice problems on Kaggle which will help me for my first challenge of analyzing a data set and making predictions from it.

Mon. Jul. 18, 2022

Digging deeper into Python and learning more through Kaggle

The first thing I started off with today is getting used to some more Python topics. I started with generators which are used to create iterators, but with a different approach. Generators are simple functions which return an iterable set of items, one at a time. A generator runs when an interation over a set of items starts using the for statement. When a generator reaches a yield statement it stops its execution and goes back to the for loop, and returns a new value from the set. After generators I learned a bit more about pandas. What I learned in pandas was different ways to index a Pandas DataFrame. The easiest way to index in pandas is using square brackets like normal lists in Python. Using a single bracket will output a Series, while a double bracket will output a DataFrame. Another way that I learned to index in pandas is using the loc and iloc commands. When using loc I learned that I have to specify rows and columns based on their row and columb labels, and to use iloc I have to specify rows and column by their integer index like normal.

Another topic I learned about relating to Python is what multiple function arguments are and how to use them. I learned how to declare a function that receives a variable number of inputs. In order to do that, I need to specific syntax which is : def function(first, second, third, *therest). Therest is a variable that is a list of variables and it receives all arguments given to the function after the first three arguments. I saw that it is also possible to send function arguments by keyword so that I don't have to worry about the order of said arguments. After this I learned what decorators are, which are simple modifications to callable objects like functions. All decorators are are functions that take a function and returns one.

Following the previous topics I learned more about handling data that I receive. I specifically learned about how to deal with inconsistent data entry such as misspelling, lack of spaces or capitalization. I found out that inconsistencies in capitalization and trailing white spaces are very common in text data and that I can fix around 80% of text data inconsistencies by fixing these. An easy way to fix this is making all text entries in a dataset lower case and removing all white spaces at the beginning of and end of a cell. Another way I learned to fix inconsistent data entry is using the fuzzywuzzy package which helps to identify which strings are closest to each other. The way the fuzzywuzzy package works is by using fuzzy matching, which is the process of automatically finding text strings that are very similar to the target strings. This package determines the closeness of strings by seeing how many characters I would have to change if I was to strange one string to another, with a lower amount of characters meaning strings are closer to one another. Fuzzy matching will not solve all of my problems regarding text entries in a data set but it will help me to save time. Fuzzywuzzy returns a ratio given two strings, and the closer the ratio is to 100, the smaller the edit distance between the two strings. Through cleaning inconsistent text entries I learned that it is a better idea to write a general purpose function that I can reuse if I might have to do specific tasks more than one. What this does is keeps me from copy pasting code often in order to save time and reduce mistakes. One of the commands part of the fuzzywuzzy package is unique, which determines the uniqueness of a string. While learning about these methods to fix text entries I was also completing exercises provided by Kaggle to make sure I actually understood how to do it.

Finally, I started to look into the Titanic data set challenge I will be doing on Kaggle and how to get started on machine learning. I was taught that one of the most basic machine learning models is called the decision tree. It is easy to understand and is the basic building block for some of the best models in data science. You start a decision tree by capturing a pattern and breaking it down into a question that can be answered with a yes or no. Capturing patterns from data is called fitting or training a model. Data that is used to fit a model is calling training data. After this I learned that the biggest pitfall of a decision tree is that it doesn't capture most factors affecting the problem i am trying to solve. In order to rectify this I can capture more factors by using a try that has more splits, which is called a deeper tree. After getting started on learning how exactly to do machine learning I was learning more about the titanic challenge and why to even do these challenges in the first place. I learned that a big reason to compete in kaggle machine learning competitions is to have a fun, structured way to practice applying my skills to real world problems. Doing these competitions is a good way to help on my own self guided projects and introduces me to a community of other new people I can explore this field with. The titanic competition is a challenge to build a predictive model that answers the question "what sorts of people were more likely to survive the titanic sinking?" The steps to Kaggle competition include, understanding the problem, doing exploratory analysis of the data I have, training, tuning, and ensembling machine learning models, and uploading my predictions and receiving an accuracy score. Exploratory data analysis includes determining if there are any missing values or skewed fields, and getting to know my data. Ensembling is combining different machine learning models. In order to improve my accuracy score I can learn more about my data since there is always more to learn about a machine learning problem. Then I can use my newfound understanding of my data to guide experimentation. Experimentation can include designing/creating new features, trying different preprocessing and trying different types of machine learning models. Lastly, one of the best ways to improve is to learn from other's code and ideas.

After finishing today, I am looking forward to learning more about how to make my own machine learning models and being able to finish my own competition on Kaggle.

Fri. Jul. 15, 2022

Contuining in Python and another look into computers

Today I continued to learn more about Python and also a bit more on other aspects of computers. One of these computers aspects is looking into algorithms again. Algorithms are shortcuts people used to tell computers what to do. Some of the big ways algorithms are used today is to detect motion, no longer requiring motion sensors, using algorithms to simulate the human brain, and marketing algorithms that determine what to advertise to consumers.

One of the big things I learned about today is data cleaning, which is a key part of a data science. Data cleaning helps to solve issues such as having missing data values, incorrect data formats, cleaning up inconsistent data entry, and so on. The first thing I need to do when I get a new data set is to take a look at it. What this does is lets me see if the data is correctly entered and what is going on with whatever data set I am looking at. If there are any missing values in the data it is represented by NaN or None. After taking a first look at the data set I need to take a deeper look into the data which includes trying to figure out why the data is the way it is and how that will affect my analysis. When I have missing values I need to use my intuition to figure our why the value is missing. I also learned that one of the most important questions I can ask myself about a dataset is if the value is missing because it wasn't recorded or because it doesn't exist. When values are missing because they don't exist, then I shouldn't try to guess what it might be. This let's me determine what values should be kept as missing. If a value is missing because it wasn't recorded, then I can try to guess what that data might have been based on other values in the column and row. Guessing missing values based on other values is called imputation. I learned that in order to carefully analyze data I would have to look at each column individually to figure out the best strategies for the missing values. One of the strategies to use when in a hurry or when having no reason to figure out why values are missing is removing any rows or columns that contain missing values. I learned that this strategy is not recommended though and that it is better to go through the data I have and get to know the data set I am working with. Using the pandas package, I can use the dropna() command in order to drop rows with missing values.

Another topic I've learned about is how to scale and normalize data. What both of these do is transforming the values of numeric variables so that the transformed data points have specific helpful properties. When scaling, I change the range of my data. This means having the data fit within a certain range, and scaling is used when you use methods based on how far apart data points are. Scaling variables helps to compare different variables on equal footing. Normalization is changing the shape of the distribution of my data. The purpose of it is to change my observations so that they can be described as a normal distribution. A normal distribution is a statistical distribution where a roughly equal amount of observations fall above and below the mean. I learned that data should be normalized if it's going to be used for machine learning or statistics techniques that assumes data is normally distributed.

After the topic of scaling and normalization I learned about the various steps that go into a machine learning projects. The steps for this are: understanding and defining the problem I have, analyzing and preparing data relevant to the problem, applying algorithms, reducing error, and predicting my result. I also learned that there are two datasets used in machine learning. There are training datasets, which are used to train my model, and testing datasets, that are used to test the model. The best way I found out to improve at machine learning are doing end to end projects. These are projects that are not overly long that I can complete myself. Along with these topics that I learned, I was also completing exercises for data cleaning that helped me get a better grasp on how to perform it.

Techie Youth inspired me to go out of my way and look for more data science resources, where I ended up finding an online course called DataQuest. I will definitely be using it in the future to help me get a better grasp of this field.

Thu. Jul. 14, 2022

Some of Python's must useful packages - Pandas Deep Dive

Today I learned more about Python, specifically one of it's most used modules called Pandas and a bit more about function.

Through Techie Youth I learned about a website called Kaggle, which has many datasets, and also various courses that teach me the fundamentals I would need to know in order to analyze my own datasets. The courses that I did today are Python and most of Pandas in Kaggle. I learned that I can define my own functions using def. When using def I need to use semicolons after and indent the following code. I can use return at the end of a function. When python encounters a return statement, it exits the function immediately, and passes the value on the right side to the calling context. I can use docstring to provide a description for a function. Docstring is a triple-quoted string, that can span multiple lines, that comes immediately after the header of a function. When I can help on a function, it shows the docstring. When you don't use return at the end of a function it outputs None. I learned how to apply functions to other functions. Some more commands I learned of are abs() to get the absolute value of a value in the parentheses, can only contain one argument though. I can also use ctrl + / and highlight lights in order to comment or uncomment them. Comments are line of code that are not run but are there to improve readability. I can use import * to make all of a module's variables directly accessible without any dotted prefix. I also learned that I should only import specific things I'll need from a module if I do not want to use the dotted prefix.

Pandas is the most popular Python python library for data analysis. The most important aspects of the pandas package is DataFrame and Series. I've learned about DataFrame yesterday, while Series is a sequence of data values. If DataFrame is a table, Series is a list. You can create a DataFrame with nothing more than a list. A series then is a single column of a DataFrame. I can assign row labels to the series the same way as with lists using the index parameter. A series, however, does not have a column name, only an overall name. In order to make it easier to understand, a DataFrame was equated to a bunch of series glued together.

I've learned that there are many forms and formats to store data, with the most basic of these being CSV. CVS stands for comma separated values and is essentially just that, a list of data all separated by commas. I also learned various commands used in Pandas. Head() is used to examine the first 5 rows of a data set. Pd.read_cvs() is used to read data into a DataFrame, and has over 30 optional parameters. When a data set has a built in index, to make pandas use that column for an index, I can specify the column to use using the index_col command. As I talked about yesterday, pandas is also widely used because it can handle missing entries in data set well. Entries that are missing are given the value NaN, short for not a number. The type of NaN is float64. There are various methods that are specific to missing data in pandas. In order to select NaN entries I can use the pd.isnull() or pd.notnull() command. Values can also get replaced using pandas. The command for that is fillna(). What this command does is provide various ways to mitigate missing data. One thing that is possible with that command is replacing any NaN data entries with "Unknown". Another possible strategy is using the replace() method. Replace is useful for replacing missing data which is goven some kind of sentinel value in a data set. Sentinel value is a special type of value that allows a different number of inputs by the user each time such.

Proceeding each topic, I also had to do code exercises through Kaggle. The exercises included applying the topics I just learned onto problems which were given to me to solve. I have found learning about Python's use in analyzing data very enjoyable. I have also liked what kaggle provides and am looking forward to when I am able to join one of their competitions and analyze a set of data on my own. I will be looking for further python data science resource in order to strengthen my skill set.

Wed. Jul. 13, 2022

Going further into Python, data science perspective

Today I continued to work on improving my python skills through various resources I've been provided with.

The first thing I did was continue DataCamp's python tutorial and even go into their intermediate tutorial section. I learned that data visualization is a very important part of data analysis. You need data visualization to share valuable insights with other people. There is a package called matplotlib which is the biggest data visualization package for python. From the matplotlib package you can use plt.plot to plot data points. This command creates a plot with all the data points connected. This tells python what to plot and how to plot it. You need to use plt.show to actually show the plot you make. To make a scatter plot you can use the plt.scatter command to only plot individual data points without connected them. Scatter plot is often the better choice over line plots for many applications. Scatter plots are also a more honest way to plot data since you can clearly see how many data points a graph is based on.

Histograms are another type of plot I learned about which is another type of visualizing data. Histograms help to get the idea about distribution of variables. To build a histogram you first need a to have a line of data points which are then divided into equal chunks called bins. The data points in each bin are counted and the number you get is the height of each bar. You use plot.hist in order to make a histogram. The histograms command has various arguments to it with some of them being x and bin. X should be a list of values you want to build a histogram for. You an use the second argument, bins, to tell python into how many bins the data should be divided. Python will automatically determine the range of these values and where each data point falls. If bins is not specified Python will default to 10. Histograms are useful to give a bigger picture. One way they're useful is being able to easily see how demographics will be changing over time. After learning about the basics of a histogram I was taught on how to use them and made my own using data provided.

Along with learning about and applying the use of histograms and scatter plots I learned how to customize plots. You can make different plot types, colors, shapes, labels, axes, etc. The choices you make depend on the data that you have and the story that you want the data to tell. You can label axes of a plot using plt.xlabel() and plt.ylabel(). Functions should be called before you use the plt.show function because otherwise any customizations that you make will not be displayed before the plot comes up. I learned to use plt.title() to make the title for a plot, with the inside being a string. I found out that in order to make axes from the start of a plot I can use the ticks function. There is plt.yticks and plt.xticks for the x and y axis. You can input a list to ticks in order to determine the scale for the axis. A second argument can be added to the tick functions which is a list with the display names of the tips. I can add to plots by renaming the variable containing the plots data and adding it to a list of other data. The information gets sorted on its own when combining them. The plt.scatter command, like plt.hist also has a wide array of arguments that can be used. Some of the arguments scatter are x, y, s, c and alpha. X, and y are the data points themselves. S is the size of the data points. C are the colors of the different data points. Alpha is the transparency of the points in a scatter plot, between 0 and 1 with 0 being transparent and 1 being not transparent at all.

After various plots I learned about dictionaries in Python. A dictionary is a collection which is ordered, changeable and does not allow duplicates. Dictionaries are used to store data values in key:value pairs. What ordered means is that the items have a defined order and that the order will not change. Dictionaries have their own class in python, dict, along with their own set of methods, like many other objects in python. Following dictionaries I learned about something called the pandas data frame. This dataframe comes from the pandas package and the it is a 2 dimensional data structure, like a 2D array, or table with rows and columns. This can be used for a wide variety of operations and analytics. Pandas DataFrame consists of three principal components, the data, rows, and column. Some good reasons to use panda include it: being able to easily handle missing data, providing an efficient way to slice data, and it providing a flexible way to merge, concatenate, or reshape the data. After learning a bit about the pandas data frame I also started to work a bit with it and learn how to use them.

Along with learning about Python, I started to get into learning about which Kaggle competitions I would be interested in doing. I would be starting with the titanic data set to start to get a grasp of what I would be doing. I have also learned more aspects of Python through using Kaggle's free courses. I am looking forward to learning more through Kaggle and more about Python in general so I can start working on the competitions provided in order to start expressing some of what I am learning.

Tue. Jul. 12, 2022

Jumping into Python

Today I began to take a deeper dive into Python in order to get a grasp of what I must do for the AI assignment. Under the technical skills section of AI, I've been on the welcome to data science science Python tutorials where I've learned a lot.

I firstly learned about specific functions in python and other basics. Some of those functions are len, int, and sorted. Sorted takes three arguments, iterable, key and reverse. Iterable is whether or not an object can be looped over or iterated. Afterwards I learned how to add elements to a list using append.. I saw that every data structure in python is called an objects. Objects have types and also methods. Methods are functions specific to an object. Some examples of methods are strings having the capitalize and replace function, floats having bit_length() and conjugate, and lists having index and count. Index gives the position of an element in a list and count shows the number of times a specified element appears in the least. A tuple is a fixed sized list. I then learned different modifiers for a and what % is used for. % is used to format a set of variables in a type together with a format string, which contains normal text together with "argument specifiers", special symbols. Arguments specifiers are %s for strings, %d for integers, %f for floating point numbers, %.<# of digits>f for floating point numbers with a fixed amount of digits to the right of the dot, and %x/%X used for integers in hex representation.

Through the Python welcome tutorial I learned that Python uses boolean logic to evaluate conditions. Boolean logic means that whatever you are evaluating can only be true or false. Some boolean operators I learned are "and" and "or" which are used to build complex boolean expression. "In" is used to check if a specified object exists within an iterable object container such as a list. "Is" unlike "==", equals to, does not match the values of variables but the stances themselves. "Not" inverts a boolean. Regarding boolean logic, a statement is considered true if the "true" variable is given or calculated using an expression, or an object which is not considered "empty" is passed. Some objects that are considered "empty" are an empty string, an empty, list, the number zero, and the "false" boolean variable. Following conditions I learned about loops and other functions related to it. "For" loops iterate over a given sequence and can iterate over a sequence of numbers using the "range" and "xrange" functions. "While" loops repeat as long a certain boolean condition is met. Break is used to exit a for or while loop while continue is used to skip the current block and return to the "for" or "while" statements.

I was taught that to make up for the deficiencies in regular python there are things called packages. Packages are a directory of python scripts where each script is called a module. Packages specify functions and new python types aimed at solving particular problems. There are thousands of packages available, not all available by default. You need to install packages to your system using the install command. Before you can install packages though you firstly need to import them. You can import specific functions from packages but if you do so you need to properly label the package of the function you're using to make it easier for a reader to understand.

After learning some Python basics, through DataCamp's tutorial I learned that a common package used by data scientists is NumPy, numeric Python. This is a python package that provides an alternative to python list, NumPy Array. This allows you to perform calculations over entire lists, making calculations much easier. A big difference between array and list is that you are only allowed to contain one data type in an array, which makes NumPy much faster. You can work with NumPy arrays in most of the same ways that you can as Python lists, but you have to watch out for the different behaviors between each.

I further expanded on NumPy arrays through more practice today. I learned about 2d arrays in NumPy which are 2 lists in a lists. Afterwards I learned how to manipulate 2d arrays. I can also run basic statics on data using NumPy such as mean, median, corrcoef, seeing if data is correlated, and standard deviation. You can also use NumPy to generate large lists of data. The function that can do that is np.random.normal() which is for distribution mean, distribution standard deviation and number of samples. I also applied methods to 2d arrays in order to get useful information out of data.

I feel very motivated learning about these core data science topics and after completing DataCamp's Introduction to Python curriculum, will be working on their data science focused Intermediate Python tutorial in order to better prepare myself for handling the data sets I need to as part of this unit.

Mon. Jul. 11, 2022

Various aspects of AI and Python introduction

Today the first thing I've learned about is the different parts of AI and how they're used. Neural networks are a subset of AI and are a means of doing machine learning in which a computer learns to perform some tasks by analyzing training examples. These examples are usually hand labeled in advance. Neural nets are loosely modeled on the human brain by a neural net consisting of thousands, up to millions, or simple processing nodes that are densely connected like neurons. These nodes receive data that is then multiplied by a weight, and if that product is not above a certain threshold then the data is not send to the next layer of nodes until it is above the threshold. Information in a neural network also only moves one way. I've also learned that neural networks were fading in use until recently a new methods being used to analyze them have come out. I have also learned about different resources to find datasets that I can use in the future to practice my analyzing skills.

After learning more about neural networks I've learned of the impact that AI has on the robotics field today. AI is revolutionizing the robotics field with there being no signs of it stopping. AI today is mainly used for enhancing industrial robotics capabilities. Some of the reasons for using AI in robots include, AI robots having better mobility, robots being able to teach themselves how to handle objects they haven't handled before in a short period of time, being able to ensure optimum reliability and accuracy, having use for customer service, and being able to make robots that can handle hazardous tasks. Even with all of these advantages, there are some downsides to using AI in robotics. Some of these disadvantage are a possible loss of jobs, and a lack of creativity when it comes to solving problems.

I've further learned into what Natural Language Processing (NLP) is. NLP deals with how computer understand and translate human language. With NLP, machines can make sense of written or spoken texts. This can be used to make machines perform tasks like translation and keyword extract. I also found out that even thought NLP uses machine learned, there is a big difference between the two. NLP uses machine learned to automate the processes in which it is used and deliver accurate results. Machine learning is the process of applying algorithms that teach machines how to automatically learn and improve from experience without being explicitly programmed. There are two techniques used in NLP. The techniques are semantic analysis and syntactic analysis. Semantic analysis focuses on capturing the meaning of text, it studies the meaning of each individual work, then looks at the combination of words and what they mean in context. Syntactic analysis analyzes text using basic grammar rules to identify sentence structure, how words are organized and how words relate to each other. NLP is used in business today for translating, analyzing insight on how customers feel about brands or products, and many other things.

Through techie youth I learned that data analysis has a multitude of uses in the world now through being to develop targeted marketing campaigns, help businesses optimize their marketing strategy, disease recognition, pattern recognition, board and video games, medical diagnosis, machine translating and many, many more.

The final thing I learned today was various aspects of python and beginning to learn the language itself. In python today I learned about lists, integers, strings, variables and floats. I learned what the print function does, python syntax, how to manipulate lists and reference them, and I started to get into what functions are and how they are used. I am looking forward to learning more about python and how to apply it to data science and AI.

Fri. Jul. 8, 2022

The rise of Artificial Intelligence - What I learned today at Techie Youth

One of the first things I learned was that jobs that use artificial intelligence are rising rapidly and paying a large amount today, with many possibilities of working remotely. There are even many opportunities in to make a career out of this without getting a career. Techie Youth showed me some of the various steps I can take to get myself more acquainted with the field and some of what I can start doing today in order to learn this and put myself in a better position to be successful in the future.

Another topic I've learned about is about psuedocode, algorithms and what they are. Psuedocode is a way to describe code through a mix of English and features that you see in programming languages. An algorithm is a set of steps to accomplish a task. It is important to know how to design new algorithms, as well how to analyze their correctness and efficiency. In order to measure the efficiency of an algorithm we used a method called asymptotic analysis, which allows algorithms to be compared independently of a particular programming language or hardware so that we can conclusively decide how some algorithms are more efficient than others.

Before we even make an algorithm the preexisting rules need to be established. After you establish these rules, I learned of different aspects of algorithms including linear search and binary search. Linear search is going linearly through an array. Binary search is an efficient algorithm for finding sorted items form a list. The main idea of binary search is to keep track of the current range of reasonable guesses and in each proceeding step to divide that range into two ranges of roughly to same size. You keep following this until you arrive at whatever index you're looking for in an array.

Another aspect of algorithms are running time, which is an important aspect in figuring out the efficiency of an algorithm. The running time of an algorithm can be thought of as a mix of two ideas. The first of these ideas is that run time is thought of as a function of the size of it's input. The other idea is that we must focus on how fast a function grows with the input size, which is called the growth rate of the running time. The function needs to be simplified to keep the most important parts. I learned that running time of an algorithm is described by the symbol Θ(n) or theta n. When running time is described as Θ(n) that means that once n, the number of times a loop is iterated, is large enough the run time is at least k1 x n or at most k2 x n. What this means is that the run time of a function would fall between the constants k1 and k2. Using theta notation also means that we're running asymptomatically tight bound on the running time. Asymptomatically is because this only matters for large values of n and tight bound because we've narrowed the running time without a constant factor above and below.

I learned that sorting a list of items into ascending and descending order can help a human or computer find items in that list quickly. Furthermore, I learned of two different methods used when sorting an algorithm. The first method is called selection sort. In selection sort you go through an array and select elements of that array that are then sorted. A downfall of selection sort is how quickly the run times goes up in comparison to n, the number of values. The second sorting method I was taught about is insertion sorting. Using insertion sorting you loop over positions in an array starting with index one. Each new position is inserted into the correct place in the sorted subarray to the left of that position. The main step in insertion sorting is making space in the array to put the current value in.

Another algorithm I found out about is recursive algorithms. This is where an algorithm solves a problem by solving smaller instances of the same problem, unless the problem is so small that it can be solved directly. The solution to the that smaller instance is then used to solve the original problem. In order for a recursive algorithm to work, the smaller subproblems must eventually arrive to the original problem. I learned of the rules of recursion, the first of which being that each recursive call should be on a smaller instance of the same problem, that is a smaller subproblem. The second rule is that recursive calls must eventually reach a base case without further recursion.

On the matter of artificial intelligence, I learned that there are six major branches of it. The six branches are machine learning, neural network, robotics, expert systems, fuzzy logic, and natural language processing. Machine learning is a technique that gives computers the potential to learn without being programming. It is the science that enables machines to translate, execute, and investigate data for solving real-world problems. Neural network is a set of algorithms that are used to find elemental relationships across the bunches of data via the process that imitates the human brain operating process. Robotics focuses and designing and construction robots. Expert systems refers to a computer systems that mimics the decision making intelligence of a human. Fuzzy logic is a technique that represents and modifies uncertain information by measuring the degree to which the hypothesis is correct. Finally, natural language processing is the part of computer science and AI that helps in the communication between computer and human by natural language. After looking at the major branches of AI, I think that a job that focuses on using machine learning along with neural networks would be really interesting to me.

I've learned that even without a degree there are various ways to get a machine learning job. The most important one is through getting experience. There are tons of resources online about learning the skills necessary for these jobs, and after getting the necessary skills you can move onto the experience building part. A big way to build up experience is through freelance work. This can include reaching out to charities or visiting websites where businesses post contracts.

In order to show me the potential that AI and automation has, I even read about there being twelve different sectors in which automation will be taking over in the short term. This shows me that even now it is never a bad idea to get into a field related to automation and AI.

After seeing the potential of AI I learned of two different methods of thinking, one of which is applied to AI, and a big thing that is used today in this burgeoning field. One of the thinking methods is case based reasoning which is an experience based approach to solving new problems by adapting previously successful solutions to similar problems. I also learning of algorithmic thinking which is thinking by breaking down the exact and individual steps of what we are doing. It's biggest use is in breaking down a problem in order to make it easier to solve. After the thinking methods is when I learned of something called predictive analytics. Predictive analytics is vastly used in the field of AI today and is the use of data, statistical algorithms, and machine learning techniques to identify the likelihood of future outcomes based on historical outcomes. It is used to help provide assessments of what will happen in the future. Like machine learning and AI, this is being used in a wide array of sectors today.

Thu. Jul. 7, 2022

What is your time worth and how to make the most of your life

Today I started to learn of the value of my time and how to determine it. The easiest way to find out how much your time is worth is figuring out your hourly salary and applying that to yourself. After you know how much your time is worth, then you can start to determine things you could be doing that would actually save you more in the long run. Some simple ways to save time are getting a dishwasher, shortening your commute time through moving, and not sleeping for too long. I've learned that I need to maximize the usage of my time to get the most out of life. Spending more money sometimes can actually save you more in the long run over cheaper options. Another thing I can do is avoid things that will slow me down in life such as overeating. One of the biggest things I can do to use my time well is plan my day out and prepare for the next day. I need to keep in mind that time is elastic and that the key to managing my time is getting my priorities straight.

The biggest thing I could be doing with the time that I start to save by following some of the earlier steps is being more productive. A key to being productive is finding things I can do that bring be joy. I've learned the art of being hyper productive, which is doing something beneficial for myself everyday. I should not waste time in a way that doesn't help me improve and grow. It is also important to make sure to learn skills that are cost effective, as in skills which will help me earn more money down the road. I should choose a goal that I want to work towards and learn skills that will help me achieve that goal.

I've also learned about how time is wasted from a business perspective. One of the biggest ways time is wasted at a job is through scope creep. As work goes on more and more things get added to a project leading to delays. I've seen that there is something called the triple constraint when it comes to business. What this references is a triangle with quality at the center that has three aspects which are scope, schedule and budget which pull on the triangle. If any one aspect moves, one or the others have to as well or else there is a problem with quality. Something that applies to time wasted from a business and personal viewpoint is something called Parkinson's Law. This law states work expands so as to fill the time available for its completion. This essentially states that it will take as long as the deadline for something in order to complete it. A good way to combat is making shorter, but reasonable, deadlines for projects in order to have more actual work being done. This law also implies that if there is no deadline for work it will never done. This helps me to know that I need to have good deadlines for things I also need to do myself in order to complete them.

Techie Youth taught me various tools and methods that can be used to help make the most of one's time in a business and personally. One said tool is using a kansan board, which is a project management tool used to manage work. There are multiple columns with different categories listing the stages of a project and you move a sticky note containing a project needing to be done from one column to the next as workflow progresses. Another important way to maximize time usage is avoid multitasking. I learned of studies done concluding that multitasking kills productivity, and that our brains don't really multitask but switch quickly from one task to another. Practicing mindfulness and setting blocks of time to get specific work done is another way to make the most of my time. In order to avoid losing time I should also invest in my health and well being. I need to exercise more to happier and connect with others often. I learned how important my mental health is and that it is just as important to look after as my physical health.

After learning about time management and my own well being, I started to learn of ways to come across as a good leader. Some important things a leader should do are have a strong voice, not pausing while speaking, not speaking too much, trying not to fidget and to not have a crutch such as smoking or drinking. I learned that a big aspect of being a leader is not so much not having flaws, which is nearly impossible since we are only human, but hiding those flaws from others. A good leader should also give genuine feedback, dress well and have good hygiene, make eye contact, be in good shape, and have strong self control. These are not all of the aspects that a good leader should have, but a list of good points to start at in order to project yourself as someone who is fit to lead others.

Under the same vein of aspects a good leader should have, a good leader must be able to manage their stress well. I learned that a key way to manage my stress level is to not expect everything to go my way. I need to go with the flow and be more adaptable in life. Another thing I must do is to set realistic expectations. I can set lofty goals, but what I expect myself to do needs to be things that are attainable. I can also meditate in order to manage my stress. Following stress management, it is important to only surround myself with positive thinking people. I've realized that I need to distance myself from people who don't support me or are envious of me because more likely than not they will be detrimental to my career or well being.

Finally, I was taught that having a good role model can be a very important thing to achieving my goals. Even when having a role model, I should aspire to be a whole model which starts by living my life with intention. I need to live with the intention of changing my life and the lives of everyone that I meet. Along with having a role model getting a mentor can also be a very beneficial thing. When looking for a mentor, I need to have a goal in mind and also do my research beforehand. In order to find a mentor I also need to demonstrate what I can offer them, be it enthusiasm, passion, drive, or anything of the sort. I need to ask specific questions that are relevant to their expertise. I've also learned how even if I am looking for a mentor, I need to start and forge my own. Mentors are there to provide me guidance, not solve my problems, and in order to receive guidance I need something they can actually guide me on. When it comes to my career, it is never to late to start. Everyone has to start somewhere and the only thing stopping me is my fear of failure.

Wed. Jul. 6, 2022

Handling Remote work to financial advice - What I learned today through Techie Youth

Today I've learned various ways to stand out on a resume firstly. Some of these include writing an article that gets published, public speaking, or going to local meetups.

I've also learned of ways I can be a better remote worker. I should use meetings to collaborate with colleagues and debrief after a project to learn more for the next one. In order to avoid issues with colleagues while working remotely I should watch for tone in emails and messages and try to empathize with others more. I should not assume anything when working with others in order to avoid any miscommunication. Finally, I should not micromanage teammates, be proactive and be able to work independently.

In today's world, remote work is becoming more and more common, so I've been taught ways to socialize more while working remotely. Some of these methods are having scheduled video chats and having non work related chat channels to make up for there being no small talk. I've also learned what slack and doodle are. Slack is a messenger app with chat channels while doodle is a website that lets you designate possible times for a meeting to avoid the hassle of coordinating what times people are free.

Furthermore I've learned about the tactics that bad employers do and how to avoid them. One of them is being pressured to work for longer than I'm scheduled for in arrangements where I do not get overtime pay. One way I can combat it is coming in later the next day for however many hours I stayed late the previous day, or leaving earlier in the day to make up for the time I lost the previous day. I should be comfortable saying no to projects if I do not think I will be able to fulfill everything that is required of me. I also learned that scope creep is a big way people get taken advantage of at work. This would include getting more responsibilities over time while not receiving sufficient compensation for the extra work being asked of you. You should play loyalty with people and not companies, most of the times companies will not hesitate to lay you off.

I also learned the importance of networking, common places to network, and what to do when networking. Common places to network include conferences, chamber of commerce events, professional meetings, and job fairs. When actually going to network I should look up the position people hold to know if they would actually be a useful connection to have.

I was taught how to negotiate prices with as well. Firstly I should figure out how desperate the seller is to sell their product, those desperate to make a sale are easier to get a better price from. While negotiating I can mention wanting a fair price on the items and pick out any flaws in the product to use it as leverage.

Finally I learned of various things I should do when I actually have money. I should make a budget of. I would make a spreadsheet of monthly expenses to keep track of expenses and save money. Other things I should do are avoiding lifestyle inflation, investing long term, and not buying unnecessary things. Lastly I learned of various steps I should take when I get money. These steps include firstly putting money into a retirement fund. After that I should put money towards necessities and then pay off loans. I can use two methods of paying for a loan, the avalanche or snowball method. For the snowball method I pay loans based on the total balance left, starting the lowest amount. To use the avalanche method I pay loans off depending on how high the interest rate is. After paying off loans I should save 4-8 months of payroll and last invest any excess money I have. Various methods of investing my money include investing in real estate, the stock market, bonds, arbitrage sales, micro loans, stock dividends and ETFs. If possible I try having a source of passive income as well.

Tue. Jul. 5, 2022

What exactly does Techie Youth do? - The process of getting a job and what to do beforehand

I learned today that Techie Youth is a company that is trying to provide youths with the skills they need in order to get a job in the tech sector today. I learned to not work for the miserly and to not compromise my morals for a job. Also that if possible to work somewhere you love in order to be more passionate about my work.

There are various ways businesses use to process transactions and different ways of getting paid myself. Businesses either use a payment aggregator or have a merchant account. A payment aggregator is a company that has a merchant account and lets business come in under them as sub merchants. What this ends up meaning is that you deal with a middle man instead of a bank with the upside of there being no vetting process in order to get accepted. Having a merchant account means a business has an account directly with a bank, but you need to get vetted first before being able to open said account. Payment aggregators are easier to use to start off with, but once a business is earning a larger amount of income each month, many times a business would save more money applying for a merchant account since they tend to have less fees than payment aggregators.

I also learned about the various benefits of working remotely, including but not limited to, there being no commute, being able to work from any location, and having increased productivity from being able to choose your work environment.

Through the help of Techie Youth I was able to make a resume and cover letter, and update my LinkedIn. I further learned on how to improve my public image.

Techie Youth has also helped me to be better prepared for future job interviews. I've learned that for an interview I should be well dressed, take notes of the interview, come in with questions, and to bring examples of my work if applicable. Another thing I learned was that I should always thank the interviewer within 24 hours after the interview. When doing video interviews some of the things I need are to make sure my camera is at eye level, have the lightning behind my camera, have a clean background that isn't distracting, and always look into the camera when talking. I need to be hungry in order to get opportunities ahead of others and be able to get what I want in life.

In order to have a better standing when going in to an interview, I should never be afraid to walk from a deal. In that same line of thinking I should try not to be in a position where I can't walk away because then I've lost most of my negotiating power.

Techie Youth taught me that there are many important aspects when it comes to negotiating my salary in a job interview. I should never provide a salary expectation at the beginning and not give a number at the start since I might set my expectation too low or price myself out, ranges are no good either. I should try to get a scope of the work in order to figure out for myself what a good salary would be, and not be too stubborn on not giving a salary first. I should always make sure to know the terms of payment for any job along with having that written out. Finally, I learned to not be afraid to ask for the job after the interviewing process, the worst they can do is say no, and it'll show that I have interest in the job.