Parameters :
array: [array_like] Input array. order: [Ccontiguous, Fcontiguous, Acontiguous; optional] Ccontiguous order in memory (last index varies the fastest) C order means that operating rowrise on the array will be slightly quicker FORTRANcontiguous order in memory (first index varies the fastest). F order means that columnwise operations will be faster. `A` means to read / write the elements in Fortranlike index order if, array is Fortran contiguous in memory, Clike order otherwise
Return:
Flattened array having same type as the Input array and and order as per choice.
Code 1: Shows that array.ravel is equivalent to reshaping (1, order = order)

Output:
Original array: [[0 1 2 3 4] [5 6 7 8 9] [10 11 12 13 14]] ravel (): [0 1 2 ..., 12 13 14] numpy.ravel () == numpy.reshape (1) Reshaping array: [0 1 2 ..., 12 13 14]Code 2: Shows order manipulation
Output:
Original array: [[0 1 2 3 4] [5 6 7 8 9] [10 11 12 13 14]] About numpy.ravel (): numpy.ravel (): [0 1 2 ..., 12 13 14] Maintains A Order: [0 1 2 ..., 12 13 14] array2 [[[0 2 4] [1 3 5]] [[6 8 10] [7 9 11]]] Maintains A Order: [0 1 2 ..., 9 10 11]Links:
https:// docs.scipy.org/doc/numpydev/reference/generated/numpy.ravel.html#numpy.ravelNotes:
These codes will not work for online ID. Please run them on your systems to see how they work
This article is provided by Mohit Gupta_OMG
numpy.ravel () in Python: StackOverflow Questions
What is the difference between flatten and ravel functions in numpy?
import numpy as np y = np.array(((1,2,3),(4,5,6),(7,8,9))) OUTPUT: print(y.flatten()) [1 2 3 4 5 6 7 8 9] print(y.ravel()) [1 2 3 4 5 6 7 8 9]
Both function return the same list. Then what is the need of two different functions performing same job.
List to array conversion to use ravel() function
I have a list in python and I want to convert it to an array to be able to use
ravel()
function.Answer #1
The current API is that:
flatten
always returns a copy.ravel
returns a view of the original array whenever possible. This isn"t visible in the printed output, but if you modify the array returned by ravel, it may modify the entries in the original array. If you modify the entries in an array returned from flatten this will never happen. ravel will often be faster since no memory is copied, but you have to be more careful about modifying the array it returns.reshape((1,))
gets a view whenever the strides of the array allow it even if that means you don"t always get a contiguous array.Answer #2
Change this line:
model = forest.fit(train_fold, train_y)
to:
model = forest.fit(train_fold, train_y.values.ravel())
Edit:
.values
will give the values in an array. (shape: (n,1)
.ravel
will convert that array shape to (n, )Answer #3
Distribution Fitting with Sum of Square Error (SSE)
This is an update and modification to Saullo"s answer, that uses the full list of the current
scipy.stats
distributions and returns the distribution with the least SSE between the distribution"s histogram and the data"s histogram.Example Fitting
Using the El Ni√±o dataset from
statsmodels
, the distributions are fit and error is determined. The distribution with the least error is returned.All Distributions
Best Fit Distribution
Example Code
%matplotlib inline import warnings import numpy as np import pandas as pd import scipy.stats as st import statsmodels.api as sm from scipy.stats._continuous_distns import _distn_names import matplotlib import matplotlib.pyplot as plt matplotlib.rcParams["figure.figsize"] = (16.0, 12.0) matplotlib.style.use("ggplot") # Create models from data def best_fit_distribution(data, bins=200, ax=None): """Model data by finding best fit distribution to data""" # Get histogram of original data y, x = np.histogram(data, bins=bins, density=True) x = (x + np.roll(x, 1))[:1] / 2.0 # Best holders best_distributions = [] # Estimate distribution parameters from data for ii, distribution in enumerate([d for d in _distn_names if not d in ["levy_stable", "studentized_range"]]): print("{:>3} / {:<3}: {}".format( ii+1, len(_distn_names), distribution )) distribution = getattr(st, distribution) # Try to fit the distribution try: # Ignore warnings from data that can"t be fit with warnings.catch_warnings(): warnings.filterwarnings("ignore") # fit dist to data params = distribution.fit(data) # Separate parts of parameters arg = params[:2] loc = params[2] scale = params[1] # Calculate fitted PDF and error with fit in distribution pdf = distribution.pdf(x, loc=loc, scale=scale, *arg) sse = np.sum(np.power(y  pdf, 2.0)) # if axis pass in add to plot try: if ax: pd.Series(pdf, x).plot(ax=ax) end except Exception: pass # identify if this distribution is better best_distributions.append((distribution, params, sse)) except Exception: pass return sorted(best_distributions, key=lambda x:x[2]) def make_pdf(dist, params, size=10000): """Generate distributions"s Probability Distribution Function """ # Separate parts of parameters arg = params[:2] loc = params[2] scale = params[1] # Get sane start and end points of distribution start = dist.ppf(0.01, *arg, loc=loc, scale=scale) if arg else dist.ppf(0.01, loc=loc, scale=scale) end = dist.ppf(0.99, *arg, loc=loc, scale=scale) if arg else dist.ppf(0.99, loc=loc, scale=scale) # Build PDF and turn into pandas Series x = np.linspace(start, end, size) y = dist.pdf(x, loc=loc, scale=scale, *arg) pdf = pd.Series(y, x) return pdf # Load data from statsmodels datasets data = pd.Series(sm.datasets.elnino.load_pandas().data.set_index("YEAR").values.ravel()) # Plot for comparison plt.figure(figsize=(12,8)) ax = data.plot(kind="hist", bins=50, density=True, alpha=0.5, color=list(matplotlib.rcParams["axes.prop_cycle"])[1]["color"]) # Save plot limits dataYLim = ax.get_ylim() # Find best fit distribution best_distibutions = best_fit_distribution(data, 200, ax) best_dist = best_distibutions[0] # Update plots ax.set_ylim(dataYLim) ax.set_title(u"El Ni√±o sea temp. All Fitted Distributions") ax.set_xlabel(u"Temp (¬∞C)") ax.set_ylabel("Frequency") # Make PDF with best params pdf = make_pdf(best_dist[0], best_dist[1]) # Display plt.figure(figsize=(12,8)) ax = pdf.plot(lw=2, label="PDF", legend=True) data.plot(kind="hist", bins=50, density=True, alpha=0.5, label="Data", legend=True, ax=ax) param_names = (best_dist[0].shapes + ", loc, scale").split(", ") if best_dist[0].shapes else ["loc", "scale"] param_str = ", ".join(["{}={:0.2f}".format(k,v) for k,v in zip(param_names, best_dist[1])]) dist_str = "{}({})".format(best_dist[0].name, param_str) ax.set_title(u"El Ni√±o sea temp. with best fit distribution " + dist_str) ax.set_xlabel(u"Temp. (¬∞C)") ax.set_ylabel("Frequency")
Answer #4
Disclaimer: I"m mostly writing this post with syntactical considerations and general behaviour in mind. I"m not familiar with the memory and CPU aspect of the methods described, and I aim this answer at those who have reasonably small sets of data, such that the quality of the interpolation can be the main aspect to consider. I am aware that when working with very large data sets, the betterperforming methods (namely
griddata
andRBFInterpolator
without aneighbors
keyword argument) might not be feasible.Note that this answer uses the new
RBFInterpolator
class introduced inSciPy
1.7.0. For the legacyRbf
class see the previous version of this answer.I"m going to compare three kinds of multidimensional interpolation methods (
interp2d
/splines,griddata
andRBFInterpolator
). I will subject them to two kinds of interpolation tasks and two kinds of underlying functions (points from which are to be interpolated). The specific examples will demonstrate twodimensional interpolation, but the viable methods are applicable in arbitrary dimensions. Each method provides various kinds of interpolation; in all cases I will use cubic interpolation (or something close^{1}). It"s important to note that whenever you use interpolation you introduce bias compared to your raw data, and the specific methods used affect the artifacts that you will end up with. Always be aware of this, and interpolate responsibly.The two interpolation tasks will be
 upsampling (input data is on a rectangular grid, output data is on a denser grid)
 interpolation of scattered data onto a regular grid
The two functions (over the domain
[x, y] in [1, 1]x[1, 1]
) will be
 a smooth and friendly function:
cos(pi*x)*sin(pi*y)
; range in[1, 1]
 an evil (and in particular, noncontinuous) function:
x*y / (x^2 + y^2)
with a value of 0.5 near the origin; range in[0.5, 0.5]
Here"s how they look:
I will first demonstrate how the three methods behave under these four tests, then I"ll detail the syntax of all three. If you know what you should expect from a method, you might not want to waste your time learning its syntax (looking at you,
interp2d
).Test data
For the sake of explicitness, here is the code with which I generated the input data. While in this specific case I"m obviously aware of the function underlying the data, I will only use this to generate input for the interpolation methods. I use numpy for convenience (and mostly for generating the data), but scipy alone would suffice too.
import numpy as np import scipy.interpolate as interp # auxiliary function for mesh generation def gimme_mesh(n): minval = 1 maxval = 1 # produce an asymmetric shape in order to catch issues with transpositions return np.meshgrid(np.linspace(minval, maxval, n), np.linspace(minval, maxval, n + 1)) # set up underlying test functions, vectorized def fun_smooth(x, y): return np.cos(np.pi*x) * np.sin(np.pi*y) def fun_evil(x, y): # watch out for singular origin; function has no unique limit there return np.where(x**2 + y**2 > 1e10, x*y/(x**2+y**2), 0.5) # sparse input mesh, 6x7 in shape N_sparse = 6 x_sparse, y_sparse = gimme_mesh(N_sparse) z_sparse_smooth = fun_smooth(x_sparse, y_sparse) z_sparse_evil = fun_evil(x_sparse, y_sparse) # scattered input points, 10^2 altogether (shape (100,)) N_scattered = 10 rng = np.random.default_rng() x_scattered, y_scattered = rng.random((2, N_scattered**2))*2  1 z_scattered_smooth = fun_smooth(x_scattered, y_scattered) z_scattered_evil = fun_evil(x_scattered, y_scattered) # dense output mesh, 20x21 in shape N_dense = 20 x_dense, y_dense = gimme_mesh(N_dense)
Smooth function and upsampling
Let"s start with the easiest task. Here"s how an upsampling from a mesh of shape
[6, 7]
to one of[20, 21]
works out for the smooth test function:Even though this is a simple task, there are already subtle differences between the outputs. At a first glance all three outputs are reasonable. There are two features to note, based on our prior knowledge of the underlying function: the middle case of
griddata
distorts the data most. Note they == 1
boundary of the plot (nearest thex
label): the function should be strictly zero (sincey == 1
is a nodal line for the smooth function), yet this is not the case forgriddata
. Also note thex == 1
boundary of the plots (behind, to the left): the underlying function has a local maximum (implying zero gradient near the boundary) at[1, 0.5]
, yet thegriddata
output shows clearly nonzero gradient in this region. The effect is subtle, but it"s a bias none the less.Evil function and upsampling
A bit harder task is to perform upsampling on our evil function:
Clear differences are starting to show among the three methods. Looking at the surface plots, there are clear spurious extrema appearing in the output from
interp2d
(note the two humps on the right side of the plotted surface). Whilegriddata
andRBFInterpolator
seem to produce similar results at first glance, producing local minima near[0.4, 0.4]
that is absent from the underlying function.However, there is one crucial aspect in which
RBFInterpolator
is far superior: it respects the symmetry of the underlying function (which is of course also made possible by the symmetry of the sample mesh). The output fromgriddata
breaks the symmetry of the sample points, which is already weakly visible in the smooth case.Smooth function and scattered data
Most often one wants to perform interpolation on scattered data. For this reason I expect these tests to be more important. As shown above, the sample points were chosen pseudouniformly in the domain of interest. In realistic scenarios you might have additional noise with each measurement, and you should consider whether it makes sense to interpolate your raw data to begin with.
Output for the smooth function:
Now there"s already a bit of a horror show going on. I clipped the output from
interp2d
to between[1, 1]
exclusively for plotting, in order to preserve at least a minimal amount of information. It"s clear that while some of the underlying shape is present, there are huge noisy regions where the method completely breaks down. The second case ofgriddata
reproduces the shape fairly nicely, but note the white regions at the border of the contour plot. This is due to the fact thatgriddata
only works inside the convex hull of the input data points (in other words, it doesn"t perform any extrapolation). I kept the default NaN value for output points lying outside the convex hull.^{2} Considering these features,RBFInterpolator
seems to perform best.Evil function and scattered data
And the moment we"ve all been waiting for:
It"s no huge surprise that
interp2d
gives up. In fact, during the call tointerp2d
you should expect some friendlyRuntimeWarning
s complaining about the impossibility of the spline to be constructed. As for the other two methods,RBFInterpolator
seems to produce the best output, even near the borders of the domain where the result is extrapolated.
So let me say a few words about the three methods, in decreasing order of preference (so that the worst is the least likely to be read by anybody).
scipy.interpolate.RBFInterpolator
The RBF in the name of the
RBFInterpolator
class stands for "radial basis functions". To be honest I"ve never considered this approach until I started researching for this post, but I"m pretty sure I"ll be using these in the future.Just like the splinebased methods (see later), usage comes in two steps: first one creates a callable
RBFInterpolator
class instance based on the input data, and then calls this object for a given output mesh to obtain the interpolated result. Example from the smooth upsampling test:import scipy.interpolate as interp sparse_points = np.stack([x_sparse.ravel(), y_sparse.ravel()], 1) # shape (N, 2) in 2d dense_points = np.stack([x_dense.ravel(), y_dense.ravel()], 1) # shape (N, 2) in 2d zfun_smooth_rbf = interp.RBFInterpolator(sparse_points, z_sparse_smooth.ravel(), smoothing=0, kernel="cubic") # explicit default smoothing=0 for interpolation z_dense_smooth_rbf = zfun_smooth_rbf(dense_points).reshape(x_dense.shape) # not really a function, but a callable class instance zfun_evil_rbf = interp.RBFInterpolator(sparse_points, z_sparse_evil.ravel(), smoothing=0, kernel="cubic") # explicit default smoothing=0 for interpolation z_dense_evil_rbf = zfun_evil_rbf(dense_points).reshape(x_dense.shape) # not really a function, but a callable class instance
Note that we had to do some array building gymnastics to make the API of
RBFInterpolator
happy. Since we have to pass the 2d points as arrays of shape(N, 2)
, we have to flatten the input grid and stack the two flattened arrays. The constructed interpolator also expects query points in this format, and the result will be a 1d array of shape(N,)
which we have to reshape back to match our 2d grid for plotting. SinceRBFInterpolator
makes no assumptions about the number of dimensions of the input points, it supports arbitrary dimensions for interpolation.So,
scipy.interpolate.RBFInterpolator
 produces wellbehaved output even for crazy input data
 supports interpolation in higher dimensions
 extrapolates outside the convex hull of the input points (of course extrapolation is always a gamble, and you should generally not rely on it at all)
 creates an interpolator as a first step, so evaluating it in various output points is less additional effort
 can have output point arrays of arbitrary shape (as opposed to being constrained to rectangular meshes, see later)
 more likely to preserving the symmetry of the input data
 supports multiple kinds of radial functions for keyword
kernel
:multiquadric
,inverse_multiquadric
,inverse_quadratic
,gaussian
,linear
,cubic
,quintic
,thin_plate_spline
(the default). As of SciPy 1.7.0 the class doesn"t allow passing a custom callable due to technical reasons, but this is likely to be added in a future version. can give inexact interpolations by increasing the
smoothing
parameterOne drawback of RBF interpolation is that interpolating
N
data points involves inverting anN x N
matrix. This quadratic complexity very quickly blows up memory need for a large number of data points. However, the newRBFInterpolator
class also supports aneighbors
keyword parameter that restricts computation of each radial basis function tok
nearest neighbours, thereby reducing memory need.
scipy.interpolate.griddata
My former favourite,
griddata
, is a general workhorse for interpolation in arbitrary dimensions. It doesn"t perform extrapolation beyond setting a single preset value for points outside the convex hull of the nodal points, but since extrapolation is a very fickle and dangerous thing, this is not necessarily a con. Usage example:sparse_points = np.stack([x_sparse.ravel(), y_sparse.ravel()], 1) # shape (N, 2) in 2d z_dense_smooth_griddata = interp.griddata(sparse_points, z_sparse_smooth.ravel(), (x_dense, y_dense), method="cubic") # default method is linear
Note that the same array transformations were necessary for the input arrays as for
RBFInterpolator
. The input points have to be specified in an array of shape[N, D]
inD
dimensions, or alternatively as a tuple of 1d arrays:z_dense_smooth_griddata = interp.griddata((x_sparse.ravel(), y_sparse.ravel()), z_sparse_smooth.ravel(), (x_dense, y_dense), method="cubic")
The output point arrays can be specified as a tuple of arrays of arbitrary dimensions (as in both above snippets), which gives us some more flexibility.
In a nutshell,
scipy.interpolate.griddata
 produces wellbehaved output even for crazy input data
 supports interpolation in higher dimensions
 does not perform extrapolation, a single value can be set for the output outside the convex hull of the input points (see
fill_value
) computes the interpolated values in a single call, so probing multiple sets of output points starts from scratch
 can have output points of arbitrary shape
 supports nearestneighbour and linear interpolation in arbitrary dimensions, cubic in 1d and 2d. Nearestneighbour and linear interpolation use
NearestNDInterpolator
andLinearNDInterpolator
under the hood, respectively. 1d cubic interpolation uses a spline, 2d cubic interpolation usesCloughTocher2DInterpolator
to construct a continuously differentiable piecewisecubic interpolator. might violate the symmetry of the input data
scipy.interpolate.interp2d
/scipy.interpolate.bisplrep
The only reason I"m discussing
interp2d
and its relatives is that it has a deceptive name, and people are likely to try using it. Spoiler alert: don"t use it (as of scipy version 1.7.0). It"s already more special than the previous subjects in that it"s specifically used for twodimensional interpolation, but I suspect this is by far the most common case for multivariate interpolation.As far as syntax goes,
interp2d
is similar toRBFInterpolator
in that it first needs constructing an interpolation instance, which can be called to provide the actual interpolated values. There"s a catch, however: the output points have to be located on a rectangular mesh, so inputs going into the call to the interpolator have to be 1d vectors which span the output grid, as if fromnumpy.meshgrid
:# reminder: x_sparse and y_sparse are of shape [6, 7] from numpy.meshgrid zfun_smooth_interp2d = interp.interp2d(x_sparse, y_sparse, z_sparse_smooth, kind="cubic") # default kind is "linear" # reminder: x_dense and y_dense are of shape (20, 21) from numpy.meshgrid xvec = x_dense[0,:] # 1d array of unique x values, 20 elements yvec = y_dense[:,0] # 1d array of unique y values, 21 elements z_dense_smooth_interp2d = zfun_smooth_interp2d(xvec, yvec) # output is (20, 21)shaped array
One of the most common mistakes when using
interp2d
is putting your full 2d meshes into the interpolation call, which leads to explosive memory consumption, and hopefully to a hastyMemoryError
.Now, the greatest problem with
interp2d
is that it often doesn"t work. In order to understand this, we have to look under the hood. It turns out thatinterp2d
is a wrapper for the lowerlevel functionsbisplrep
+bisplev
, which are in turn wrappers for FITPACK routines (written in Fortran). The equivalent call to the previous example would bekind = "cubic" if kind == "linear": kx = ky = 1 elif kind == "cubic": kx = ky = 3 elif kind == "quintic": kx = ky = 5 # bisplrep constructs a spline representation, bisplev evaluates the spline at given points bisp_smooth = interp.bisplrep(x_sparse.ravel(), y_sparse.ravel(), z_sparse_smooth.ravel(), kx=kx, ky=ky, s=0) z_dense_smooth_bisplrep = interp.bisplev(xvec, yvec, bisp_smooth).T # note the transpose
Now, here"s the thing about
interp2d
: (in scipy version 1.7.0) there is a nice comment ininterpolate/interpolate.py
forinterp2d
:if not rectangular_grid: # TODO: surfit is really not meant for interpolation! self.tck = fitpack.bisplrep(x, y, z, kx=kx, ky=ky, s=0.0)
and indeed in
interpolate/fitpack.py
, inbisplrep
there"s some setup and ultimatelytx, ty, c, o = _fitpack._surfit(x, y, z, w, xb, xe, yb, ye, kx, ky, task, s, eps, tx, ty, nxest, nyest, wrk, lwrk1, lwrk2)
And that"s it. The routines underlying
interp2d
are not really meant to perform interpolation. They might suffice for sufficiently wellbehaved data, but under realistic circumstances you will probably want to use something else.Just to conclude,
interpolate.interp2d
 can lead to artifacts even with welltempered data
 is specifically for bivariate problems (although there"s the limited
interpn
for input points defined on a grid) performs extrapolation
 creates an interpolator as a first step, so evaluating it in various output points is less additional effort
 can only produce output over a rectangular grid, for scattered output you would have to call the interpolator in a loop
 supports linear, cubic and quintic interpolation
 might violate the symmetry of the input data
^{1}I"m fairly certain that the
cubic
andlinear
kind of basis functions ofRBFInterpolator
do not exactly correspond to the other interpolators of the same name.
^{2}These NaNs are also the reason for why the surface plot seems so odd: matplotlib historically has difficulties with plotting complex 3d objects with proper depth information. The NaN values in the data confuse the renderer, so parts of the surface that should be in the back are plotted to be in the front. This is an issue with visualization, and not interpolation.Answer #5
To save some folks some time, here is a list I extracted from a small corpus. I do not know if it is complete, but it should have most (if not all) of the help definitions from upenn_tagset...
CC: conjunction, coordinating
& "n and both but either et for less minus neither nor or plus so therefore times v. versus vs. whether yet
CD: numeral, cardinal
mid1890 ninethirty fortytwo onetenth ten million 0.5 one forty seven 1987 twenty "79 zero two 78degrees eightyfour IX "60s .025 fifteen 271,124 dozen quintillion DM2,000 ...
DT: determiner
all an another any both del each either every half la many much nary neither no some such that the them these this those
EX: existential there
there
IN: preposition or conjunction, subordinating
astride among upon whether out inside pro despite on by throughout below within for towards near behind atop around if like until below next into if beside ...
JJ: adjective or numeral, ordinal
third illmannered prewar regrettable oiled calamitous first separable ectoplasmic batterypowered participatory fourth stilltobenamed multilingual multidisciplinary ...
JJR: adjective, comparative
bleaker braver breezier briefer brighter brisker broader bumper busier calmer cheaper choosier cleaner clearer closer colder commoner costlier cozier creamier crunchier cuter ...
JJS: adjective, superlative
calmest cheapest choicest classiest cleanest clearest closest commonest corniest costliest crassest creepiest crudest cutest darkest deadliest dearest deepest densest dinkiest ...
LS: list item marker
A A. B B. C C. D E F First G H I J K One SP44001 SP44002 SP44005 SP44007 Second Third Three Two * a b c d first five four one six three two
MD: modal auxiliary
can cannot could couldn"t dare may might must need ought shall should shouldn"t will would
NN: noun, common, singular or mass
commoncarrier cabbage knuckleduster Casino afghan shed thermostat investment slide humour falloff slick wind hyena override subhumanity machinist ...
NNP: noun, proper, singular
Motown Venneboerger Czestochwa Ranzer Conchita Trumplane Christos Oceanside Escobar Kreisler Sawyer Cougar Yvette Ervin ODI Darryl CTCA Shannon A.K.C. Meltex Liverpool ...
NNS: noun, common, plural
undergraduates scotches bricabrac products bodyguards facets coasts divestitures storehouses designs clubs fragrances averages subjectivists apprehensions muses factoryjobs ...
PDT: predeterminer
all both half many quite such sure this
POS: genitive marker
" "s
PRP: pronoun, personal
hers herself him himself hisself it itself me myself one oneself ours ourselves ownself self she thee theirs them themselves they thou thy us
PRP$: pronoun, possessive
her his mine my our ours their thy your
RB: adverb
occasionally unabatingly maddeningly adventurously professedly stirringly prominently technologically magisterially predominately swiftly fiscally pitilessly ...
RBR: adverb, comparative
further gloomier grander graver greater grimmer harder harsher healthier heavier higher however larger later leaner lengthier less perfectly lesser lonelier longer louder lower more ...
RBS: adverb, superlative
best biggest bluntest earliest farthest first furthest hardest heartiest highest largest least less most nearest second tightest worst
RP: particle
aboard about across along apart around aside at away back before behind by crop down ever fast for forth from go high i.e. in into just later low more off on open out over per pie raising start teeth that through under unto up uppp upon whole with you
TO: "to" as preposition or infinitive marker
to
UH: interjection
Goodbye Goody Gosh Wow Jeepers Jeesus Hubba Hey Keereist Oops amen huh howdy uh dammit whammo shucks heck anyways whodunnit honey golly man baby diddle hush sonuvabitch ...
VB: verb, base form
ask assemble assess assign assume atone attention avoid bake balkanize bank begin behold believe bend benefit bevel beware bless boil bomb boost brace break bring broil brush build ...
VBD: verb, past tense
dipped pleaded swiped regummed soaked tidied convened halted registered cushioned exacted snubbed strode aimed adopted belied figgered speculated wore appreciated contemplated ...
VBG: verb, present participle or gerund
telegraphing stirring focusing angering judging stalling lactating hankerin" alleging veering capping approaching traveling besieging encrypting interrupting erasing wincing ...
VBN: verb, past participle
multihulled dilapidated aerosolized chaired languished panelized used experimented flourished imitated reunifed factored condensed sheared unsettled primed dubbed desired ...
VBP: verb, present tense, not 3rd person singular
predominate wrap resort sue twist spill cure lengthen brush terminate appear tend stray glisten obtain comprise detest tease attract emphasize mold postpone sever return wag ...
VBZ: verb, present tense, 3rd person singular
bases reconstructs marks mixes displeases seals carps weaves snatches slumps stretches authorizes smolders pictures emerges stockpiles seduces fizzes uses bolsters slaps speaks pleads ...
WDT: WHdeterminer
that what whatever which whichever
WP: WHpronoun
that what whatever whatsoever which who whom whosoever
WRB: Whadverb
how however whence whenever where whereby whereever wherein whereof why
Answer #6
The answer below pertains primarily to Signed Cookies, an implementation of the concept of sessions (as used in web applications). Flask offers both, normal (unsigned) cookies (via
request.cookies
andresponse.set_cookie()
) and signed cookies (viaflask.session
). The answer has two parts, the first describes how a Signed Cookie is generated, and the second is presented in the form of a QA that addresses different aspects of the scheme. The syntax used for the examples is Python3, but the concepts apply also to previous versions.What is
SECRET_KEY
(or how to create a Signed Cookie)?Signing cookies is a preventive measure against cookie tampering. During the process of signing a cookie, the
SECRET_KEY
is used in a way similar to how a "salt" would be used to muddle a password before hashing it. Here"s a (wildly) simplified description of the concept. The code in the examples is meant to be illustrative. Many of the steps have been omitted and not all of the functions actually exist. The goal here is to provide an understanding of the general idea, actual implementations will be a bit more involved. Also, keep in mind that Flask does most of this for you in the background. So, besides setting values to your cookie (via the session API) and providing aSECRET_KEY
, it"s not only illadvised to reimplement this yourself, but there"s no need to do so:A poor man"s cookie signature
Before sending a Response to the browser:
( 1 ) First a
SECRET_KEY
is established. It should only be known to the application and should be kept relatively constant during the application"s life cycle, including through application restarts.# choose a salt, a secret string of bytes >>> SECRET_KEY = "my super secret key".encode("utf8")
( 2 ) create a cookie
>>> cookie = make_cookie( ... name="_profile", ... content="uid=382membership=regular", ... ... ... expires="July 1 2030..." ... ) >>> print(cookie) name: _profile content: uid=382membership=regular... ... ... expires: July 1 2030, 1:20:40 AM UTC
( 3 ) to create a signature, append (or prepend) the
SECRET_KEY
to the cookie byte string, then generate a hash from that combination.# encode and salt the cookie, then hash the result >>> cookie_bytes = str(cookie).encode("utf8") >>> signature = sha1(cookie_bytes+SECRET_KEY).hexdigest() >>> print(signature) 7ae0e9e033b5fa53aa....
( 4 ) Now affix the signature at one end of the
content
field of the original cookie.# include signature as part of the cookie >>> cookie.content = cookie.content + "" + signature >>> print(cookie) name: _profile content: uid=382membership=regular7ae0e9... < signature domain: .example.com path: / send for: Encrypted connections only expires: July 1 2030, 1:20:40 AM UTC
and that"s what"s sent to the client.
# add cookie to response >>> response.set_cookie(cookie) # send to browser >
Upon receiving the cookie from the browser:
( 5 ) When the browser returns this cookie back to the server, strip the signature from the cookie"s
content
field to get back the original cookie.# Upon receiving the cookie from browser >>> cookie = request.get_cookie() # pop the signature out of the cookie >>> (cookie.content, popped_signature) = cookie.content.rsplit("", 1)
( 6 ) Use the original cookie with the application"s
SECRET_KEY
to recalculate the signature using the same method as in step 3.# recalculate signature using SECRET_KEY and original cookie >>> cookie_bytes = str(cookie).encode("utf8") >>> calculated_signature = sha1(cookie_bytes+SECRET_KEY).hexdigest()
( 7 ) Compare the calculated result with the signature previously popped out of the just received cookie. If they match, we know that the cookie has not been messed with. But if even just a space has been added to the cookie, the signatures won"t match.
# if both signatures match, your cookie has not been modified >>> good_cookie = popped_signature==calculated_signature
( 8 ) If they don"t match then you may respond with any number of actions, log the event, discard the cookie, issue a fresh one, redirect to a login page, etc.
>>> if not good_cookie: ... security_log(cookie)
Hashbased Message Authentication Code (HMAC)
The type of signature generated above that requires a secret key to ensure the integrity of some contents is called in cryptography a Message Authentication Code or MAC.
I specified earlier that the example above is an oversimplification of that concept and that it wasn"t a good idea to implement your own signing. That"s because the algorithm used to sign cookies in Flask is called HMAC and is a bit more involved than the above simple stepbystep. The general idea is the same, but due to reasons beyond the scope of this discussion, the series of computations are a tad bit more complex. If you"re still interested in crafting a DIY, as it"s usually the case, Python has some modules to help you get started :) here"s a starting block:
import hmac import hashlib def create_signature(secret_key, msg, digestmod=None): if digestmod is None: digestmod = hashlib.sha1 mac = hmac.new(secret_key, msg=msg, digestmod=digestmod) return mac.digest()
The documentaton for hmac and hashlib.
The "Demystification" of
SECRET_KEY
:)What"s a "signature" in this context?
It"s a method to ensure that some content has not been modified by anyone other than a person or an entity authorized to do so.
One of the simplest forms of signature is the "checksum", which simply verifies that two pieces of data are the same. For example, when installing software from source it"s important to first confirm that your copy of the source code is identical to the author"s. A common approach to do this is to run the source through a cryptographic hash function and compare the output with the checksum published on the project"s home page.
Let"s say for instance that you"re about to download a project"s source in a gzipped file from a web mirror. The SHA1 checksum published on the project"s web page is "eb84e8da7ca23e9f83...."
# so you get the code from the mirror download https://mirror.examplecodedump.com/source_code.tar.gz # you calculate the hash as instructed sha1(source_code.tar.gz) > eb84e8da7c....
Both hashes are the same, you know that you have an identical copy.
What"s a cookie?
An extensive discussion on cookies would go beyond the scope of this question. I provide an overview here since a minimal understanding can be useful to have a better understanding of how and why
SECRET_KEY
is useful. I highly encourage you to follow up with some personal readings on HTTP Cookies.A common practice in web applications is to use the client (web browser) as a lightweight cache. Cookies are one implementation of this practice. A cookie is typically some data added by the server to an HTTP response by way of its headers. It"s kept by the browser which subsequently sends it back to the server when issuing requests, also by way of HTTP headers. The data contained in a cookie can be used to emulate what"s called statefulness, the illusion that the server is maintaining an ongoing connection with the client. Only, in this case, instead of a wire to keep the connection "alive", you simply have snapshots of the state of the application after it has handled a client"s request. These snapshots are carried back and forth between client and server. Upon receiving a request, the server first reads the content of the cookie to reestablish the context of its conversation with the client. It then handles the request within that context and before returning the response to the client, updates the cookie. The illusion of an ongoing session is thus maintained.
What does a cookie look like?
A typical cookie would look like this:
name: _profile content: uid=382status=genie domain: .example.com path: / send for: Encrypted connections only expires: July 1 2030, 1:20:40 AM UTC
Cookies are trivial to peruse from any modern browser. On Firefox for example go to Preferences > Privacy > History > remove individual cookies.
The
content
field is the most relevant to the application. Other fields carry mostly meta instructions to specify various scopes of influence.Why use cookies at all?
The short answer is performance. Using cookies, minimizes the need to look things up in various data stores (memory caches, files, databases, etc), thus speeding things up on the server application"s side. Keep in mind that the bigger the cookie the heavier the payload over the network, so what you save in database lookup on the server you might lose over the network. Consider carefully what to include in your cookies.
Why would cookies need to be signed?
Cookies are used to keep all sorts of information, some of which can be very sensitive. They"re also by nature not safe and require that a number of auxiliary precautions be taken to be considered secure in any way for both parties, client and server. Signing cookies specifically addresses the problem that they can be tinkered with in attempts to fool server applications. There are other measures to mitigate other types of vulnerabilities, I encourage you to read up more on cookies.
How can a cookie be tampered with?
Cookies reside on the client in text form and can be edited with no effort. A cookie received by your server application could have been modified for a number of reasons, some of which may not be innocent. Imagine a web application that keeps permission information about its users on cookies and grants privileges based on that information. If the cookie is not tinkerproof, anyone could modify theirs to elevate their status from "role=visitor" to "role=admin" and the application would be none the wiser.
Why is a
SECRET_KEY
necessary to sign cookies?Verifying cookies is a tad bit different than verifying source code the way it"s described earlier. In the case of the source code, the original author is the trustee and owner of the reference fingerprint (the checksum), which will be kept public. What you don"t trust is the source code, but you trust the public signature. So to verify your copy of the source you simply want your calculated hash to match the public hash.
In the case of a cookie however the application doesn"t keep track of the signature, it keeps track of its
SECRET_KEY
. TheSECRET_KEY
is the reference fingerprint. Cookies travel with a signature that they claim to be legit. Legitimacy here means that the signature was issued by the owner of the cookie, that is the application, and in this case, it"s that claim that you don"t trust and you need to check the signature for validity. To do that you need to include an element in the signature that is only known to you, that"s theSECRET_KEY
. Someone may change a cookie, but since they don"t have the secret ingredient to properly calculate a valid signature they cannot spoof it. As stated a bit earlier this type of fingerprinting, where on top of the checksum one also provides a secret key, is called a Message Authentication Code.What about Sessions?
Sessions in their classical implementation are cookies that carry only an ID in the
content
field, thesession_id
. The purpose of sessions is exactly the same as signed cookies, i.e. to prevent cookie tampering. Classical sessions have a different approach though. Upon receiving a session cookie the server uses the ID to look up the session data in its own local storage, which could be a database, a file, or sometimes a cache in memory. The session cookie is typically set to expire when the browser is closed. Because of the local storage lookup step, this implementation of sessions typically incurs a performance hit. Signed cookies are becoming a preferred alternative and that"s how Flask"s sessions are implemented. In other words, Flask sessions are signed cookies, and to use signed cookies in Flask just use itsSession
API.Why not also encrypt the cookies?
Sometimes the contents of cookies can be encrypted before also being signed. This is done if they"re deemed too sensitive to be visible from the browser (encryption hides the contents). Simply signing cookies however, addresses a different need, one where there"s a desire to maintain a degree of visibility and usability to cookies on the browser, while preventing that they"d be meddled with.
What happens if I change the
SECRET_KEY
?By changing the
SECRET_KEY
you"re invalidating all cookies signed with the previous key. When the application receives a request with a cookie that was signed with a previousSECRET_KEY
, it will try to calculate the signature with the newSECRET_KEY
, and both signatures won"t match, this cookie and all its data will be rejected, it will be as if the browser is connecting to the server for the first time. Users will be logged out and their old cookie will be forgotten, along with anything stored inside. Note that this is different from the way an expired cookie is handled. An expired cookie may have its lease extended if its signature checks out. An invalid signature just implies a plain invalid cookie.So unless you want to invalidate all signed cookies, try to keep the
SECRET_KEY
the same for extended periods.What"s a good
SECRET_KEY
?A secret key should be hard to guess. The documentation on Sessions has a good recipe for random key generation:
>>> import os >>> os.urandom(24) "xfd{Hxe5<x95xf9xe3x96.5xd1x01O<!xd5xa2xa0x9fR"xa1xa8"
You copy the key and paste it in your configuration file as the value of
SECRET_KEY
.Short of using a key that was randomly generated, you could use a complex assortment of words, numbers, and symbols, perhaps arranged in a sentence known only to you, encoded in byte form.
Do not set the
SECRET_KEY
directly with a function that generates a different key each time it"s called. For example, don"t do this:# this is not good SECRET_KEY = random_key_generator()
Each time your application is restarted it will be given a new key, thus invalidating the previous.
Instead, open an interactive python shell and call the function to generate the key, then copy and paste it to the config.
Answer #7
Actually the purpose of
np.meshgrid
is already mentioned in the documentation:Return coordinate matrices from coordinate vectors.
Make ND coordinate arrays for vectorized evaluations of ND scalar/vector fields over ND grids, given onedimensional coordinate arrays x1, x2,..., xn.
So it"s primary purpose is to create a coordinates matrices.
You probably just asked yourself:
Why do we need to create coordinate matrices?
The reason you need coordinate matrices with Python/NumPy is that there is no direct relation from coordinates to values, except when your coordinates start with zero and are purely positive integers. Then you can just use the indices of an array as the index. However when that"s not the case you somehow need to store coordinates alongside your data. That"s where grids come in.
Suppose your data is:
1 2 1 2 5 2 1 2 1
However, each value represents a 3 x 2 kilometer area (horizontal x vertical). Suppose your origin is the upper left corner and you want arrays that represent the distance you could use:
import numpy as np h, v = np.meshgrid(np.arange(3)*3, np.arange(3)*2)
where v is:
array([[0, 0, 0], [2, 2, 2], [4, 4, 4]])
and h:
array([[0, 3, 6], [0, 3, 6], [0, 3, 6]])
So if you have two indices, let"s say
x
andy
(that"s why the return value ofmeshgrid
is usuallyxx
orxs
instead ofx
in this case I choseh
for horizontally!) then you can get the x coordinate of the point, the y coordinate of the point and the value at that point by using:h[x, y] # horizontal coordinate v[x, y] # vertical coordinate data[x, y] # value
That makes it much easier to keep track of coordinates and (even more importantly) you can pass them to functions that need to know the coordinates.
A slightly longer explanation
However,
np.meshgrid
itself isn"t often used directly, mostly one just uses one of similar objectsnp.mgrid
ornp.ogrid
. Herenp.mgrid
represents thesparse=False
andnp.ogrid
thesparse=True
case (I refer to thesparse
argument ofnp.meshgrid
). Note that there is a significant difference betweennp.meshgrid
andnp.ogrid
andnp.mgrid
: The first two returned values (if there are two or more) are reversed. Often this doesn"t matter but you should give meaningful variable names depending on the context.For example, in case of a 2D grid and
matplotlib.pyplot.imshow
it makes sense to name the first returned item ofnp.meshgrid
x
and the second oney
while it"s the other way around fornp.mgrid
andnp.ogrid
.
np.ogrid
and sparse grids>>> import numpy as np >>> yy, xx = np.ogrid[5:6, 5:6] >>> xx array([[5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5]]) >>> yy array([[5], [4], [3], [2], [1], [ 0], [ 1], [ 2], [ 3], [ 4], [ 5]])
As already said the output is reversed when compared to
np.meshgrid
, that"s why I unpacked it asyy, xx
instead ofxx, yy
:>>> xx, yy = np.meshgrid(np.arange(5, 6), np.arange(5, 6), sparse=True) >>> xx array([[5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5]]) >>> yy array([[5], [4], [3], [2], [1], [ 0], [ 1], [ 2], [ 3], [ 4], [ 5]])
This already looks like coordinates, specifically the x and y lines for 2D plots.
Visualized:
yy, xx = np.ogrid[5:6, 5:6] plt.figure() plt.title("ogrid (sparse meshgrid)") plt.grid() plt.xticks(xx.ravel()) plt.yticks(yy.ravel()) plt.scatter(xx, np.zeros_like(xx), color="blue", marker="*") plt.scatter(np.zeros_like(yy), yy, color="red", marker="x")
np.mgrid
and dense/fleshed out grids>>> yy, xx = np.mgrid[5:6, 5:6] >>> xx array([[5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5], [5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5], [5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5], [5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5], [5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5], [5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5], [5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5], [5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5], [5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5], [5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5], [5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5]]) >>> yy array([[5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5], [4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4], [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3], [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [ 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], [ 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3], [ 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4], [ 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5]])
The same applies here: The output is reversed compared to
np.meshgrid
:>>> xx, yy = np.meshgrid(np.arange(5, 6), np.arange(5, 6)) >>> xx array([[5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5], [5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5], [5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5], [5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5], [5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5], [5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5], [5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5], [5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5], [5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5], [5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5], [5, 4, 3, 2, 1, 0, 1, 2, 3, 4, 5]]) >>> yy array([[5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5], [4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4], [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3], [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [ 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2], [ 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3], [ 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4], [ 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5]])
Unlike
ogrid
these arrays contain allxx
andyy
coordinates in the 5 <= xx <= 5; 5 <= yy <= 5 grid.yy, xx = np.mgrid[5:6, 5:6] plt.figure() plt.title("mgrid (dense meshgrid)") plt.grid() plt.xticks(xx[0]) plt.yticks(yy[:, 0]) plt.scatter(xx, yy, color="red", marker="x")
Functionality
It"s not only limited to 2D, these functions work for arbitrary dimensions (well, there is a maximum number of arguments given to function in Python and a maximum number of dimensions that NumPy allows):
>>> x1, x2, x3, x4 = np.ogrid[:3, 1:4, 2:5, 3:6] >>> for i, x in enumerate([x1, x2, x3, x4]): ... print("x{}".format(i+1)) ... print(repr(x)) x1 array([[[[0]]], [[[1]]], [[[2]]]]) x2 array([[[[1]], [[2]], [[3]]]]) x3 array([[[[2], [3], [4]]]]) x4 array([[[[3, 4, 5]]]]) >>> # equivalent meshgrid output, note how the first two arguments are reversed and the unpacking >>> x2, x1, x3, x4 = np.meshgrid(np.arange(1,4), np.arange(3), np.arange(2, 5), np.arange(3, 6), sparse=True) >>> for i, x in enumerate([x1, x2, x3, x4]): ... print("x{}".format(i+1)) ... print(repr(x)) # Identical output so it"s omitted here.
Even if these also work for 1D there are two (much more common) 1D grid creation functions:
Besides the
start
andstop
argument it also supports thestep
argument (even complex steps that represent the number of steps):>>> x1, x2 = np.mgrid[1:10:2, 1:10:4j] >>> x1 # The dimension with the explicit step width of 2 array([[1., 1., 1., 1.], [3., 3., 3., 3.], [5., 5., 5., 5.], [7., 7., 7., 7.], [9., 9., 9., 9.]]) >>> x2 # The dimension with the "number of steps" array([[ 1., 4., 7., 10.], [ 1., 4., 7., 10.], [ 1., 4., 7., 10.], [ 1., 4., 7., 10.], [ 1., 4., 7., 10.]])
Applications
You specifically asked about the purpose and in fact, these grids are extremely useful if you need a coordinate system.
For example if you have a NumPy function that calculates the distance in two dimensions:
def distance_2d(x_point, y_point, x, y): return np.hypot(xx_point, yy_point)
And you want to know the distance of each point:
>>> ys, xs = np.ogrid[5:5, 5:5] >>> distances = distance_2d(1, 2, xs, ys) # distance to point (1, 2) >>> distances array([[9.21954446, 8.60232527, 8.06225775, 7.61577311, 7.28010989, 7.07106781, 7. , 7.07106781, 7.28010989, 7.61577311], [8.48528137, 7.81024968, 7.21110255, 6.70820393, 6.32455532, 6.08276253, 6. , 6.08276253, 6.32455532, 6.70820393], [7.81024968, 7.07106781, 6.40312424, 5.83095189, 5.38516481, 5.09901951, 5. , 5.09901951, 5.38516481, 5.83095189], [7.21110255, 6.40312424, 5.65685425, 5. , 4.47213595, 4.12310563, 4. , 4.12310563, 4.47213595, 5. ], [6.70820393, 5.83095189, 5. , 4.24264069, 3.60555128, 3.16227766, 3. , 3.16227766, 3.60555128, 4.24264069], [6.32455532, 5.38516481, 4.47213595, 3.60555128, 2.82842712, 2.23606798, 2. , 2.23606798, 2.82842712, 3.60555128], [6.08276253, 5.09901951, 4.12310563, 3.16227766, 2.23606798, 1.41421356, 1. , 1.41421356, 2.23606798, 3.16227766], [6. , 5. , 4. , 3. , 2. , 1. , 0. , 1. , 2. , 3. ], [6.08276253, 5.09901951, 4.12310563, 3.16227766, 2.23606798, 1.41421356, 1. , 1.41421356, 2.23606798, 3.16227766], [6.32455532, 5.38516481, 4.47213595, 3.60555128, 2.82842712, 2.23606798, 2. , 2.23606798, 2.82842712, 3.60555128]])
The output would be identical if one passed in a dense grid instead of an open grid. NumPys broadcasting makes it possible!
Let"s visualize the result:
plt.figure() plt.title("distance to point (1, 2)") plt.imshow(distances, origin="lower", interpolation="none") plt.xticks(np.arange(xs.shape[1]), xs.ravel()) # need to set the ticks manually plt.yticks(np.arange(ys.shape[0]), ys.ravel()) plt.colorbar()
And this is also when NumPys
mgrid
andogrid
become very convenient because it allows you to easily change the resolution of your grids:ys, xs = np.ogrid[5:5:200j, 5:5:200j] # otherwise same code as above
However, since
imshow
doesn"t supportx
andy
inputs one has to change the ticks by hand. It would be really convenient if it would accept thex
andy
coordinates, right?It"s easy to write functions with NumPy that deal naturally with grids. Furthermore, there are several functions in NumPy, SciPy, matplotlib that expect you to pass in the grid.
I like images so let"s explore
matplotlib.pyplot.contour
:ys, xs = np.mgrid[5:5:200j, 5:5:200j] density = np.sin(ys)np.cos(xs) plt.figure() plt.contour(xs, ys, density)
Note how the coordinates are already correctly set! That wouldn"t be the case if you just passed in the
density
.Or to give another fun example using astropy models (this time I don"t care much about the coordinates, I just use them to create some grid):
from astropy.modeling import models z = np.zeros((100, 100)) y, x = np.mgrid[0:100, 0:100] for _ in range(10): g2d = models.Gaussian2D(amplitude=100, x_mean=np.random.randint(0, 100), y_mean=np.random.randint(0, 100), x_stddev=3, y_stddev=3) z += g2d(x, y) a2d = models.AiryDisk2D(amplitude=70, x_0=np.random.randint(0, 100), y_0=np.random.randint(0, 100), radius=5) z += a2d(x, y)
Although that"s just "for the looks" several functions related to functional models and fitting (for example
scipy.interpolate.interp2d
,scipy.interpolate.griddata
even show examples usingnp.mgrid
) in Scipy, etc. require grids. Most of these work with open grids and dense grids, however some only work with one of them.Answer #8
Pandas >= 0.25
Series and DataFrame methods define a
.explode()
method that explodes lists into separate rows. See the docs section on Exploding a listlike column.Since you have a list of comma separated strings, split the string on comma to get a list of elements, then call
explode
on that column.df = pd.DataFrame({"var1": ["a,b,c", "d,e,f"], "var2": [1, 2]}) df var1 var2 0 a,b,c 1 1 d,e,f 2 df.assign(var1=df["var1"].str.split(",")).explode("var1") var1 var2 0 a 1 0 b 1 0 c 1 1 d 2 1 e 2 1 f 2
Note that
explode
only works on a single column (for now). To explode multiple columns at once, see below.NaNs and empty lists get the treatment they deserve without you having to jump through hoops to get it right.
df = pd.DataFrame({"var1": ["d,e,f", "", np.nan], "var2": [1, 2, 3]}) df var1 var2 0 d,e,f 1 1 2 2 NaN 3 df["var1"].str.split(",") 0 [d, e, f] 1 [] 2 NaN df.assign(var1=df["var1"].str.split(",")).explode("var1") var1 var2 0 d 1 0 e 1 0 f 1 1 2 # empty list entry becomes empty string after exploding 2 NaN 3 # NaN left untouched
This is a serious advantage over
ravel
/repeat
based solutions (which ignore empty lists completely, and choke on NaNs).
Exploding Multiple Columns
Note that
explode
only works on a single column at a time, but you can useapply
to explode multiple column at once:df = pd.DataFrame({"var1": ["a,b,c", "d,e,f"], "var2": ["i,j,k", "l,m,n"], "var3": [1, 2]}) df var1 var2 var3 0 a,b,c i,j,k 1 1 d,e,f l,m,n 2 (df.set_index(["var3"]) .apply(lambda col: col.str.split(",").explode()) .reset_index() .reindex(df.columns, axis=1)) df var1 var2 var3 0 a i 1 1 b j 1 2 c k 1 3 d l 2 4 e m 2 5 f n 2
The idea is to set as the index, all the columns that should NOT be exploded, then explode the remaining columns via
apply
. This works well when the lists are equally sized.Answer #9
As explained here a key difference is that:
flatten
is a method of an ndarray object and hence can only be called for true numpy arrays.
ravel
is a librarylevel function and hence can be called on any object that can successfully be parsed.For example
ravel
will work on a list of ndarrays, whileflatten
is not available for that type of object.@IanH also points out important differences with memory handling in his answer.
Answer #10
I realize this is old but I figured I"d clear up a misconception for other travelers. Setting
plt.pyplot.isinteractive()
toFalse
means that the plot will on be drawn on specific commands to draw (i.e.plt.pyplot.show()
). Settingplt.pyplot.isinteractive()
toTrue
means that everypyplot
(plt
) command will trigger a draw command (i.e.plt.pyplot.show()
). So what you were more than likely looking for isplt.pyplot.show()
at the end of your program to display the graph.As a side note you can shorten these statements a bit by using the following import command
import matplotlib.pyplot as plt
rather thanmatplotlib as plt
.Books for developers
INTRODUCTION TO NUMERICAL PROGRAMMING
Taking into account the development of modern programming, especially the emerging programming languages that reflect modern practice, Numerical Programming: A Practical Guide for Scientists and...
08/08/2021
Practices of the Python Pro
Professionalquality code does more than just run without bugs. It’s clean, readable, and easy to maintain. To step up from a capable Python coder to a professional developer, you need to learn indu...
23/09/2020
Cloud Computing Implementation, Management, and Security
While there is no arguing about the staying power of the cloud model and the benefits it can bring to any organization or government, mainstream adoption depends on several key variables falling into ...
10/07/2020
Programming Rust: Fast, Safe Systems Development
Systems programming provides the basis for global calculation. Developing performancesensitive code requires a programming language that allows programmers to control the use of memory, processor tim...
23/09/2021
Get Solution for free from DataCamp guruXSubmit new EBook
