Sklearn Logistic Regression With Continuous Y

Logistic Regression with Scikit-Learn

Table of Contents


Logistic regression is a machine learning model that helps predict the probability of an occurrence (called a 'class').

In this tutorial, we will use Scikit-Learn and its logistic regression primitives to predict the likelihood of a good night's sleep based on the number of sleep hours and awakenings.


                                                1                                      import                    numpy                    as                    np                                                                    2                                      import                    pandas                    as                    pd                                                                    3                                      import                    matplotlib.pyplot                    as                    plt                                                                    4                                      from                    sklearn.model_selection                    import                    train_test_split                                                                    5                                      from                    sklearn.preprocessing                    import                    MinMaxScaler                                                                    6                                      from                    sklearn.linear_model                    import                    LogisticRegression                                                                    7                                                                                      8                                      plt                    .                    rcParams                    [                    'figure.figsize'                    ]                    =                    [                    8                    ,                    7                    ]                                                                    9                                      plt                    .                    rcParams                    [                    'figure.dpi'                    ]                    =                    100                                                            

Logistic Regression Model

A logistic regression is a kind of model that, unlike regular linear regression, helps predict the probability of a certain class (e.g., raining) being true, as a number between 1 and 0:

X -> y where 1.0 >= y >= 0.0

In a typical binary or binomial classification, features are associated with true and false values, as opposed to continuous values. The result ranges from absolutely true (1.0), to absolutely false (0.0).

Let's get to action. Let us consider a data set in which patients that suffer from sleep problems report the number of hours they had slept, how many times they had woken up during the night, and then, how well they feel in the morning, as 'sleep badly' (False) or 'sleep well' (True):

                                                                    1                                      np                    .                    random                    .                    seed                    (                    3                    )                                                                                        2                                      sleep_hours                    =                    np                    .                    linspace                    (                    1                    ,                    9                    ,                    100                    )                                                                                        3                                      awakenings                    =                    np                    .                    random                    .                    randint                    (                    1                    ,                    6                    ,                    size                    =                    1001                    )                                                                                        4                                      sleep_q                    =                    [                    6                    +                    h                    +                    (                    np                    .                    random                    .                    rand                    ()                    *                    3                    )                    -                    a                                                                                        5                                      for                    (                    h                    ,                    a                    )                    in                    zip                    (                    sleep_hours                    ,                    awakenings                    )                    ]                                                                                        6                                      X                    =                    np                    .                    array                    ([                    [                    h                    ,                    a                    ]                    for                    (                    h                    ,                    a                    )                    in                    zip                    (                    sleep_hours                    ,                    awakenings                    )                    ])                                                                                        7                                      y                    =                    [                    q                    >=                    12                    for                    q                    in                    sleep_q                    ]                                                                                        8                                                                                                          9                                      good                    =                    [                    (                    h                    ,                    a                    )                    for                    (                    h                    ,                    a                    ,                    y                    )                    in                    zip                    (                    sleep_hours                    ,                    awakenings                    ,                    y                    )                    if                    y                    ]                                                                    10                                      bad                    =                    [                    (                    h                    ,                    a                    )                    for                    (                    h                    ,                    a                    ,                    y                    )                    in                    zip                    (                    sleep_hours                    ,                    awakenings                    ,                    y                    )                    if                    not                    y                    ]                                                                    11                                                                                      12                                      plt                    .                    scatter                    ([                    t                    [                    0                    ]                    for                    t                    in                    good                    ],                    [                    t                    [                    1                    ]                    for                    t                    in                    good                    ],                    color                    =                    'b'                    ,                    label                    =                    'Slept Well'                    )                                                                    13                                      plt                    .                    scatter                    ([                    t                    [                    0                    ]                    for                    t                    in                    bad                    ],                    [                    t                    [                    1                    ]                    for                    t                    in                    bad                    ],                    color                    =                    'r'                    ,                    label                    =                    'Slept Badly'                    )                                                                    14                                      plt                    .                    legend                    (                    loc                    =                    'upper left'                    )                                                                    15                                      plt                    .                    xlabel                    (                    'Sleep Hours'                    )                                                                    16                                      plt                    .                    ylabel                    (                    'Awakenings'                    )                                                                    17                                      plt                    .                    yticks                    ([                    1                    ,                    2                    ,                    3                    ,                    4                    ,                    5                    ])                                                                    18                                      plt                    .                    show                    ()                                                            

In this example, we won't split the data set into training and test sets, but train the model on the entire data set:

                                                1                                      model                    =                    LogisticRegression                    ()                    .                    fit                    (                    X                    ,                    y                    )                                                            

That's it. We can now interrogate the number. The regular predict() method returns the class, in this case a bool answer, as opposed to a float value. We use precit_proba(), instead, to obtain the actual probability for each class (False, and True, in our case).

Example 1: Patient slept 3.2 hours, and woke up 0 times. Was his sleep good?

                                                1                                      model                    .                    predict                    ([[                    3.2                    ,                    0                    ]])                                                            
                                                1                                      model                    .                    predict_proba                    ([[                    3.2                    ,                    0                    ]])                                                            
            array([[0.89892622, 0.10107378]])                      

Example 2: Patient slept 8 hours, but woke up 10 times. Was her sleep good?

                                                1                                      model                    .                    predict                    ([[                    8.0                    ,                    10                    ]])                                                            
                                                1                                      model                    .                    predict_proba                    ([[                    8.0                    ,                    10                    ]])                                                            
            array([[9.99984809e-01, 1.51910314e-05]])                      

Example 3: Patient slept 10 hours, and woke up 0 times. Was her sleep good?

                                                1                                      model                    .                    predict                    ([[                    10.0                    ,                    0                    ]])                                                            
            array([ True])                      
                                                1                                      model                    .                    predict_proba                    ([[                    10.0                    ,                    0                    ]])                                                            
            array([[2.42600004e-04, 9.99757400e-01]])                      

In this data set, albeit synthetic, it is interesting to note how the number of awakenings influences the chances of a good night sleep. This is where the probability value produced by predict_proba() shines.

                                                                    1                                      def                    probability                    (                    awakenings                    ):                                                                                        2                                      zero                    =                    [                    [                    h                    ,                    model                    .                    predict_proba                    ([[                    h                    ,                    awakenings                    ]])[                    0                    ][                    1                    ]]                                                                                        3                                      for                    h                    in                    np                    .                    linspace                    (                    0                    ,                    12                    ,                    12                    )                    ]                                                                                        4                                      plt                    .                    plot                    ([                    t                    [                    0                    ]                    for                    t                    in                    zero                    ],                    [                    t                    [                    1                    ]                    for                    t                    in                    zero                    ])                                                                                        5                                      if                    awakenings                    >=                    3                    :                                                                                        6                                      plt                    .                    xlabel                    (                    "Sleep hours"                    )                                                                                        7                                      if                    awakenings                    in                    [                    0                    ,                    3                    ]:                                                                                        8                                      plt                    .                    ylabel                    (                    "Sleep well probability"                    )                                                                                        9                                      plt                    .                    yticks                    ([                    0.0                    ,                    0.2                    ,                    0.4                    ,                    0.6                    ,                    0.8                    ,                    1                    ])                                                                    10                                      plt                    .                    xticks                    ([                    0                    ,                    2                    ,                    4                    ,                    6                    ,                    8                    ,                    10                    ,                    12                    ])                                                                    11                                      plt                    .                    title                    (                    "# Awakenings =                                        {}                    "                    .                    format                    (                    awakenings                    ))                                                                    12                                                                                      13                                      for                    x                    in                    range                    (                    0                    ,                    6                    ):                                                                    14                                      plt                    .                    subplot                    (                    2                    ,                    3                    ,                    x                    +                    1                    )                                                                    15                                      probability                    (                    x                    )                                                            


A logistic regression uses a linear model which can be regularised, just like the regular regression models that produce continuous values. Let us first take a look at the class boundary for the model we were working on, without any added penalties.

                                                                    1                                      def                    show_class_boundary                    (                    models                    ):                                                                                        2                                      plt                    .                    scatter                    ([                    t                    [                    0                    ]                    for                    t                    in                    good                    ],                    [                    t                    [                    1                    ]                    for                    t                    in                    good                    ],                    color                    =                    'b'                    ,                    label                    =                    'Slept Well'                    )                                                                                        3                                      plt                    .                    scatter                    ([                    t                    [                    0                    ]                    for                    t                    in                    bad                    ],                    [                    t                    [                    1                    ]                    for                    t                    in                    bad                    ],                    color                    =                    'r'                    ,                    label                    =                    'Slept Badly'                    )                                                                                        4                                      plt                    .                    xlabel                    (                    'Sleep Hours'                    )                                                                                        5                                      plt                    .                    ylabel                    (                    'Awakenings'                    )                                                                                        6                                      plt                    .                    yticks                    ([                    1                    ,                    2                    ,                    3                    ,                    4                    ,                    5                    ])                                                                                        7                                      for                    i                    ,                    m                    in                    enumerate                    (                    models                    ):                                                                                        8                                      class_boundary                    =                    [                    (                    h                    ,                    a                    )                                                                                        9                                      for                    h                    in                    np                    .                    linspace                    (                    0                    ,                    10                    ,                    200                    )                                                                    10                                      for                    a                    in                    np                    .                    linspace                    (                    0                    ,                    6                    ,                    200                    )                                                                    11                                      if                    abs                    ((                    m                    .                    predict_proba                    ([[                    h                    ,                    a                    ]])[                    0                    ][                    1                    ])                    -                    0.5                    )                                                                    12                                      <=                    0.001                    ]                                                                    13                                      plt                    .                    plot                    ([                    t                    [                    0                    ]                    for                    t                    in                    class_boundary                    ],                    [                    t                    [                    1                    ]                    for                    t                    in                    class_boundary                    ],                    label                    =                    "Class Boundary                                        {}                    "                    .                    format                    (                    i                    +                    1                    ))                                                                    14                                      plt                    .                    legend                    (                    loc                    =                    'upper left'                    )                                                                    15                                                                                      16                                      show_class_boundary                    ([                    model                    ])                                                            

In the above visualisation, we can see the class boundary line includes at least one red dot. In a real world scenario, this model is actually almost perfect, since the synthetic data set we have provided is actually linear. But let's suppose that we wanted this logistic regression to work in such a way that if the prediction is true, (the patient has slept well), then, there is no chance whatsoever for false positives.

A way to accomplish this is by adding a penalty (it is L2 by default, but L1 can be selected too), using the C argument. In the below example, we set C=0.2 which has the effect of 'pushing' the class boundary to the right so that it no longer covers red dots.

                                                1                                      model2                    =                    LogisticRegression                    (                    C                    =                    0.2                    )                    .                    fit                    (                    X                    ,                    y                    )                                                                    2                                      show_class_boundary                    ([                    model                    ,                    model2                    ])                                                            

The penalty we have provided is small so it doesn't alter the score, which is already nearly 100%:

                                                1                                      display                    (                    model                    .                    score                    (                    X                    ,                    y                    ))                                                                    2                                      display                    (                    model2                    .                    score                    (                    X                    ,                    y                    ))                                                            
            0.96    0.96                      


Logistic regression is one of the simplest 'classification' models, which is most useful when the separation between the classes is somewhat linear in fashion. In future tutorials, we will explore how to treat classes whose arrangement appears to be more arbitrary.

Before You Leave

🤘 Subscribe to my 100% spam-free newsletter!


0 Response to "Sklearn Logistic Regression With Continuous Y"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel