+1 vote
in Programming Languages by (51.8k points)

I am trying to generate synthetic data for clustering using make_blobs() function of the scikit-learn module. I want all values in the data to be positive, but make_blobs() returns both positive and negative values. How to generate data with only positive values?

1 Answer

+2 votes
by (281k points)
selected by
 
Best answer

If you use the default value of parameters center_box and cluster_std, the make_blobs() function will generate data containing both positive and negative values. By modifying the value of these two parameters, you can generate data containing only positive values. 

For example, you can set center_box=(20,20) and cluster_std=2.

>>> from sklearn.datasets import make_blobs

>>> X, y = make_blobs(n_samples=10, centers=3, random_state=42, center_box=(20, 20), cluster_std=2)

>>> X

array([[22.93129754, 19.5484474 ],

       [18.91123455, 20.22184518],

       [17.97433776, 20.62849467],

       [20.13505641, 17.15050363],

       [20.48392454, 16.17343951],

       [19.06105123, 21.08512009],

       [23.15842563, 21.53486946],

       [16.55016433, 18.87542494],

       [18.18395185, 17.1753926 ],

       [19.07316461, 19.06854049]])


...