Friday 5 April 2024

NumPy Joining Array

In NumPy, joining arrays refers to combining the elements of multiple arrays into a single new array. There are two main ways to achieve this:

  1. Concatenation: This involves joining arrays along a specified axis. The most common function for concatenation is np.concatenate. It takes a sequence of arrays as its first argument and optionally the axis along which to join them. By default, concatenation happens along axis 0 (rows for 2D arrays).

Here's an example of concatenating two arrays:

 


  1. Stacking: This is similar to concatenation but with a key difference. Stacking creates a new axis along which the arrays are joined. NumPy provides convenience functions for stacking along specific axes:
  • np.hstack: Stacks arrays horizontally (column-wise) by creating a new axis 1.
  • np.vstack: Stacks arrays vertically (row-wise) by creating a new axis 0.
  • np.dstack: Stacks arrays along depth (useful for 3D arrays) by creating a new axis 2.

Here are examples

NumPy Array Iterating

You can go through NumPy arrays in two main ways:

1. Using a for loop: This is the easiest way to go through each item in a NumPy collection. You can just do a loop over the array, and each time you do that, you'll be able to reach the current element.

2. Indexing with a for loop: You can also use a for loop and indexing to go through the parts of a NumPy array. This method works well when you need to get both the number and the index of the current element.



Both methods go through the elements of the array in row-major order, which is the usual way to order things in C. This means that the elements are gone through in a way that first fills up one row of the array and then goes on to the next row.

Besides these simple ways, NumPy also has the nditer function for more complex iteration. There are many useful tools like nditer that give you more freedom when iterating over arrays.

nditer is the way to go if you need more advanced features for going through NumPy collections. But a simple for loop, with or without numbering, will do for most simple situations.

Friday 1 March 2024

Pruning in Decision Tree

Pruning is a method employed in decision tree algorithms to avoid overfitting and enhance the model's generalization capacity. Overfitting happens when a decision tree is too intricate and collects irrelevant details in the training data instead of the fundamental patterns in the data. Pruning is eliminating tree components that lack substantial predictive value, resulting in a more straightforward and easier-to-understand tree.

There are two primary forms of pruning:

  • Pre-pruning involves pruning the tree during its construction. The method assesses at each node if dividing the node further will enhance the overall performance on the validation data. If not, the node is designated as a leaf without additional division.
  • Post-pruning, also known as pruning the tree, consists of constructing the full tree and thereafter eliminating nodes that do not contribute significantly to predictive power. This is usually accomplished by methods such as cost-complexity pruning.

Decision trees that are trained on any training data run the risk of overfitting the training data.

What we mean by this is that eventually each leaf will reperesent a very specific set of attribute combinations that are seen in the training data, and the tree will consequently not be able to classify attribute value combinations that are not seen in the training data.

In order to prevent this from happening, we must prune the decision tree.

By pruning we mean that the lower ends (the leaves) of the tree are “snipped” until the tree is much smaller. The figure below shows an example of a full tree, and the same tree after it has been pruned to have only 4 leaves.

Pruned decision tree

Caption: The figure to the right is a pruned version of the decision tree to the left.

Pruning can be performed in many ways. Here are two.

Pruning by Information Gain

The simplest technique is to prune out portions of the tree that result in the least information gain. This procedure does not require any additional data, and only bases the pruning on the information that is already computed when the tree is being built from training data.

The process of IG-based pruning requires us to identify “twigs”, nodes whose children are all leaves. “Pruning” a twig removes all of the leaves which are the children of the twig, and makes the twig a leaf. The figure below illustrates this.

Pruning

Caption: Pruning the encircled twig in the left figure results in the tree to the right. The twig now becomes a leaf.

The algorithm for pruning is as follows:

  1. Catalog all twigs in the tree
  2. Count the total number of leaves in the tree.
  3. While the number of leaves in the tree exceeds the desired number:
    1. Find the twig with the least Information Gain
    2. Remove all child nodes of the twig.
    3. Relabel twig as a leaf.
    4. Update the leaf count.


Pruning by Classification Performance on Validation Set

An alternate approach is to prune the tree to maximize classification performance on a validation set (a data set with known labels, which was not used to train the tree).

We pass the validation data down the tree. At each node, we record the total number of instances and the number of misclassifications, if that node were actually a leaf. We do this at all nodes and leaves.

Subsequently, we prune all twigs where pruning results in the smallest overall increase in classification error.

The overall algorithm for pruning is as follows:

Stage 1:

  1. For each instance of validation data:
      Recursively pass
While the number of leaves in the tree exceeds the desired number:
  1. Find the twig with the least Information Gain
  2. Remove all child nodes of the twig.
  3. Relabel twig as a leaf.
  4. Update the leaf count.

Other forms for pruning

Pruning may also use other criteria, e.g. minimizing computational complexity, or using other techniques, e.g. randomized pruning of entire subtrees.

Machine Learning

More

Advertisement

Java Tutorial

More

UGC NET CS TUTORIAL

MFCS
COA
PL-CG
DBMS
OPERATING SYSTEM
SOFTWARE ENG
DSA
TOC-CD
ARTIFICIAL INT

C Programming

More

Python Tutorial

More

Data Structures

More

computer Organization

More
Top