• Chennai, Bangalore & Online: 93450 45466Coimbatore: 95978 88270Madurai: 97900 94102

  • Linear Regression Algorithm



    Linear regression is more than 200 years old algorithm is used for predicting properties with a training data set.

    In this blog we will learn

    What is linear regression
    Calculate statistical quantities from a training data set.
    Calculate linear regression coefficients from a data set.
    Make predictions using linear regression.
    Use sklearn library to make predictions with linear regression

    Linear Regression

    Simple linear regression is a straight line equation between independent and dependent variables. That straight equation is

    Here, y is a dependent variable on x (an independent variable). We will need to estimate slope and the y intercept from the training data set and once we get these coefficients we can use this equation any value for y given x as input. But why this straight line equation?

    Suppose that we have this data for the per capita income of the US(in dollars) for the years 1970 to 2016.

    I will represent the data using a jupyter notebook, and various python libraries such as pandas, numpy sklearn, matplotlib with an alias name.

    import pandas as pd

    import numpy as np

    import matplotlib.pyplot as plt

    and plot the available data(training data set) using a scatter plot diagram.

    The first five rows and the 2 columns of the data is as follows

    df = pd.read_csv(‘percapita.csv’)


    df.head()   # first five rows of the file

     

     

    year

    per_capita

    0

    1970

    3399.299037

    1

    1971

    3768.297935

    2

    1972

    4251.175484

    3

    1973

    4804.463248

    4

    1974

    5576.514583

    Now plotting the above data with 46 columns and 2 rows

    %inline matplotlib


    plt.scatter(df.year,df.per_capita)

    <matplotlib.collections.PathCollection at 0x7f3a4437e208>

    Now there could be more than one line of equations which satisfies the condition for finding the regression or prediction values as.

    But to find the best line which fits the regression with the least error value we will need to calculate the coefficients of the equation.

    So to calculate these coefficients, you’ll need to calculate the mean of both the properties first, and then find their difference from mean.

    plt.xlabel(‘years (1960 -2016)’)

    plt.ylabel(‘per_capita in dollars’)

    plt.scatter(df.year,df.per_capita)

    plt.scatter(np.mean(df.year),np.mean(df.per_capita),color=’red’)

    <matplotlib.collections.PathCollection at 0x7f3d8322e898>

    Now to draw a relation between these points we will need a straight line equation using Least Square Method (to have the least difference between predicted line and the observed values).

    So these coefficients can be calculated with

    Here, (x – x̅ )is the difference between the actual points of x and the mean value(1993.0) and (y-ȳ)is the difference between the actual value of y from the mean point (18920.1370).

    year

    per_capita_income(US$)

    x-x̅

    y-ȳ

    (x-x̅)2

    (y-ȳ)(x-x̅)

    1970

    3399.299037

    23

    -15,520.837963

    529

    -3,56,979.273149

    1971

    3768.297935

    22

    -15,151.839065

    484

    -3,33,340.45943

    When you have calculated slope(m),in this case {828.46507522}  the equation for the mean value of x and y will be

             18920.1370 = {828.46507522}*1993.0 + c

    Which on further calculation will give,

       c = -1632210.7578554575

    So now the equation, for any point of value will be

    y = {828.46507522}*x + {-1632210.7578554575}

    And there you are to predict any value of per capita for a given year.

    Check out this Online Data Science Course by Fita, which includes Supervised,Unsupervised machine learning algorithms,Data Analysis Manipulation and visualisation,reinforcement testing, hypothesis testing and much more to make an industry required data scientist at an affordable price, which includes certification, support with career guidance assistance.

    Or with a python function it can be implemented as

    #covariance between x and y

    def covar(x,x_mean,y,y_mean):

    covariance = 0.0

    for i in range(len(y)):

    covariance += (x[i] – x_mean) * (y[i] – y_mean)

    return covar

     

    #variance for difference between actual and mean value

    def variance(values):

    return np.var(values)

     

    # slope and intercept

    def coefficients(row_1,row_2):

    x_mean, y_mean = np.mean(row_1), np.mean(row_2)

    slope = covar(row_1,x_mean,row_2,y_mean)/variance(row_1)

    intercept = x_mean – (slope * y_mean)

    return [slope, intercept]

     

    def simple_linear_regression(df,test_values):

    predictions = []

    m, x = coefficients(df[[‘years’]],df[[‘per_capita’]])

    for i in test_values:

    y_values = x + m * i

    predictions.append(y_values)

    return predictions

    Estimate regression equation using sklearn

    And now here’s how you would do it with python sklearn library.Import linear_model from the library and create an instance of it.

    from sklearn import linear_model

    reg = linear_model.LinerRegression()

     

      # passing per capita as a dependent variable on per capita


    reg.fit(df[[‘years’]],df.per_capita)

    Now the model is ready for a best fit equation line, we can find out the slope and the y intercept with reg.coef_ and reg.intercept_

    reg.coef_

    reg.intercept_

    Which outputs

    828.46507522

    -1632210.7578554575

    Now let us visualise the data with matplotlib

    plt.xlabel(‘years (1960 -2020)’)

    plt.ylabel(‘per_capita (in dollars)’)


    plt.scatter(np.mean(df.year),np.mean(df.per_capita),color=’red’)


    plt.plot(df.year,reg.predict(df[[‘year’]]),color=’black’)

    [<matplotlib.lines.Line2D at 0x7fa1daffbba8>]

    and then use the predict method to predict any value of per capita for a given year.

    reg.predict([[2020]])

    Output

    41288.69409442

    Now let’s predict the per capita for recent years(testing data set) ,and store them in a csv file

    df_2 = pd.read_csv(‘years.csv’)

    df_2.head()

     

     

    year

    0

    2016

    1

    2017

    2

    2018

    3

    2019

    4

    2020

    Now store the predicted values in the new column of the years.csv file.

    predicts = reg.predict(df_2)


    df_2[‘predicted_per_capita’] = predicts  # creating new column

    df_2.to_csv(‘predictions.csv’)    # creating new file


    df_2.head()    # first five rows of the file

     

     

    year

    predicted_per_Capita

    0

    2016

    37974.833794

    1

    2017

    38803.298869

    2

    2018

    39631.763944

    3

    2019

    40460.229019

    4

    2020

    41288.694094

    You might notice the difference between the actual value of 2016 and the predicted value of 2016. This is known as mean squared error, and the correctness of the equation can be found with the R Square Method also known as coefficient of determination or coefficient of multiple determination. This R2can be calculated with the following formula.

         R2=(yp-ȳ)(xp- )

    If the more the R2is less than 1 the more the values are far the regression line.

    This was all about linear regression algorithm with an example of predicting per capita income of US for several years with a trained data set.To get in-depth knowledge of Python along with its various applications and real-time projects, you can enroll in Python Training in Chennai or Python Training in Bangalore by FITA or enroll for a Data science course at Chennai or Data science course in Bangalore which includes Supervised, Unsupervised machine learning algorithms, Data Analysis Manipulation and visualisation, reinforcement testing, hypothesis testing and much more to make an industry required data scientist at an affordable price, which includes certification, support with career guidance assistance.






    Quick Enquiry

    Contact Us

    Chennai

      93450 45466

    Bangalore

     93450 45466

    Coimbatore

     95978 88270

    For Hiring

     93840 47472
     hr@fita.in

    Corporate Training

     90036 23340


    FITA Academy Branches

    Chennai
    Bangalore
    Coimbatore
    Others
    FITA Academy - Velachery
    37F Velachery Main Road,
    Velachery, Chennai - 600042
    Tamil Nadu
    Next to Adyar Ananda Bhavan

        :   93450 45466

    FITA Academy - Anna Nagar
    No 14, Block No, 338, 2nd Ave,
    Anna Nagar,
    Chennai 600 040, Tamil Nadu
    Next to Santhosh Super Market

        :   93450 45466

    FITA Academy - T Nagar
    05, 5th Floor, Challa Mall,
    T Nagar,
    Chennai 600 017, Tamil Nadu
    Opposite to Pondy Bazaar Globus

        :   93450 45466

    FITA Academy - Tambaram
    Nehru Nagar, Kadaperi,
    GST Road, West Tambaram,
    Chennai 600 045, Tamil Nadu
    Opposite to Saravana Jewellers Near MEPZ

        :   93450 45466

    FITA Academy - Thoraipakkam
    5/350, Old Mahabalipuram Road,
    Okkiyam Thoraipakkam,
    Chennai 600 097, Tamil Nadu
    Next to Cognizant Thoraipakkam Office and Opposite to Nilgris Supermarket

        :   93450 45466

    FITA Academy Marathahalli
    No 7, J J Complex,
    ITPB Road, Aswath Nagar,
    Marathahalli Post,
    Bengaluru 560037

        :   93450 45466

    FITA Academy - Saravanampatty
    First Floor, Promenade Tower,
    171/2A, Sathy Road, Saravanampatty,
    Coimbatore - 641035
    Tamil Nadu

        :   95978 88270

    FITA Academy - Singanallur
    348/1, Kamaraj Road,
    Varadharajapuram, Singanallur,
    Coimbatore - 641015
    Tamil Nadu

        :   95978 88270

    FITA Academy - Madurai
    No.2A, Sivanandha salai,
    Arapalayam Cross Road,
    Ponnagaram Colony,
    Madurai - 625016, Tamil Nadu

        :   97900 94102

  • Trending Courses

    JAVA Training In Chennai Dot Net Training In Chennai Software Testing Training In Chennai Cloud Computing Training In Chennai AngularJS Training in Chennai Big Data Hadoop Training In Chennai Android Training In Chennai iOS Training In Chennai Web Designing Course In Chennai PHP Training In Chennai Digital Marketing Course In Chennai SEO Training In Chennai

    Oracle Training In Chennai Selenium Training In Chennai Data Science Course In Chennai RPA Training In Chennai DevOps Training In Chennai C / C++ Training In Chennai UNIX Training In Chennai Placement Training In Chennai German Classes In Chennai Python Training in Chennai Artificial Intelligence Course in Chennai AWS Training in Chennai Core Java Training in Chennai Javascript Training in ChennaiHibernate Training in ChennaiHTML5 Training in ChennaiPhotoshop Classes in ChennaiMobile Testing Training in ChennaiQTP Training in ChennaiLoadRunner Training in ChennaiDrupal Training in ChennaiManual Testing Training in ChennaiSpring Training in ChennaiStruts Training in ChennaiWordPress Training in ChennaiSAS Training in ChennaiClinical SAS Training in ChennaiBlue Prism Training in ChennaiMachine Learning course in ChennaiMicrosoft Azure Training in ChennaiUiPath Training in ChennaiMicrosoft Dynamics CRM Training in ChennaiUI UX Design course in ChennaiSalesforce Training in ChennaiVMware Training in ChennaiR Training in ChennaiAutomation Anywhere Training in ChennaiTally course in ChennaiReactJS Training in ChennaiCCNA course in ChennaiEthical Hacking course in ChennaiGST Training in ChennaiIELTS Coaching in ChennaiSpoken English Classes in ChennaiSpanish Classes in ChennaiJapanese Classes in ChennaiTOEFL Coaching in ChennaiFrench Classes in ChennaiInformatica Training in ChennaiInformatica MDM Training in ChennaiBig Data Analytics courses in ChennaiHadoop Admin Training in ChennaiBlockchain Training in ChennaiIonic Training in ChennaiIoT Training in ChennaiXamarin Training In ChennaiNode JS Training In ChennaiContent Writing Course in ChennaiAdvanced Excel Training In ChennaiCorporate Training in ChennaiEmbedded Training In ChennaiLinux Training In ChennaiOracle DBA Training In ChennaiPEGA Training In ChennaiPrimavera Training In ChennaiTableau Training In ChennaiSpark Training In ChennaiGraphic Design Courses in ChennaiAppium Training In ChennaiSoft Skills Training In ChennaiJMeter Training In ChennaiPower BI Training In ChennaiSocial Media Marketing Courses In ChennaiTalend Training in ChennaiHR Courses in ChennaiGoogle Cloud Training in ChennaiSQL Training In ChennaiCCNP Training in Chennai

  • Are You Located in Any of these Areas

    Adyar, Adambakkam, Anna Salai, Ambattur, Ashok Nagar, Aminjikarai, Anna Nagar, Besant Nagar, Chromepet, Choolaimedu, Guindy, Egmore, K.K. Nagar, Kodambakkam, Koyambedu, Ekkattuthangal, Kilpauk, Meenambakkam, Medavakkam, Nandanam, Nungambakkam, Madipakkam, Teynampet, Nanganallur, Navalur, Mylapore, Pallavaram, Purasaiwakkam, OMR, Porur, Pallikaranai, Poonamallee, Perambur, Saidapet, Siruseri, St.Thomas Mount, Perungudi, T.Nagar, Sholinganallur, Triplicane, Thoraipakkam, Tambaram, Vadapalani, Valasaravakkam, Villivakkam, Thiruvanmiyur, West Mambalam, Velachery and Virugambakkam.

    FITA Velachery or T Nagar or Thoraipakkam OMR or Anna Nagar or Tambaram branch is just few kilometre away from your location. If you need the best training in Chennai, driving a couple of extra kilometres is worth it!