Future#
Show code cell source
loading_params = dict(
delimiter="\t", skiprows=1, dtype='U', usecols=0, comments=None
)
biggest_impact = np.loadtxt(
"data/2020/biggestimpact_comments_master.tsv", **loading_params
)
other_changes = np.loadtxt(
"data/2020/significantchanges_comments_master.tsv", **loading_params
)
# Filter
biggest_impact = biggest_impact[biggest_impact != '']
other_changes = other_changes[other_changes != '']
# Re-order
rng = np.random.default_rng(0xDEADC0DE)
rng.shuffle(biggest_impact)
rng.shuffle(other_changes)
# Reporting values
glue(
'num_biggest_impact',
biggest_impact.shape[0],
display=False
)
glue(
'num_other',
other_changes.shape[0],
display=False
)
To conclude the survey, we asked participants to share their thoughts on what changes to NumPy would have the most significant impact for them as users.
Biggest Impact#
We asked survey participants the following question:
What single immediate change to NumPy would bring the most value to you as a NumPy user?
The responses of the 239 survey participants who answered this question are listed below.
Show code cell source
gen_mdlist(biggest_impact, "biggest_impacts_list.md")
Expand to see responses!
Comments |
---|
simple possibility to use different CPU cores for parallel different arrays calculating without complex sintax |
Greater reach to users in general. (Original in Spanish: Mayor alcance a las usuarias en general.) |
Automatic differentiation |
Better error messages and error handling. Sometimes numpy has incorrect statements and includes text not intended for the error message. |
User-defined scalar types (e.g. high precision floats) |
consistency with xarray api, migration path to using numpy.random with Jax |
More speed! I use numpy because it’s fast - I want more of that. Also articles and tips on how to use performance features (out kwarg comes to mind) |
Better bug tracking (current number of open but duplicated bugs in GitHub is excessive) |
Unite Numpy and Scipy in a single project / library. (Original in Spanish: Unir Numpy y Scipy en un único proyecto / librería.) |
A clearly defined and smaller API that is 100% compatible with Dask, CuPy and other array libraries. |
signal processing and filter design tools (lighter weight versions of the tools in scipy) |
Not sure. I generally love Numpy! Maybe including the Minuit optimization package more formally into the numpy ecosystem. |
a ruthless standardization on snake_case methods instead of “sometimes it’s .foobar() and sometimes it’s .foo_bar()” |
documentation should be more interactive |
Most of the documentation examples are extremely concise, and only illustrate one to a few uses of an API. |
Easier/native reading of Fortran binary files |
Full fledged typing (mypy) support, with dimensions, shape and dtype. |
newcomers tutorials |
Best practices |
Better type annotations |
Proper methods for 1:1 image display in matplotlib |
Examples and documentation |
small matrix optimizations |
a homogeneous use of size and shape parameters :p |
Versatile boundary conditions for time integration |
Type hints |
Parallelization |
more How Tos |
even better docs |
COMMENTS |
Improvements in performance |
More examples of use with visualisation tools |
I would love to see some kind of series of videos or webinars teaching numpy |
Interoperability with other low level array libraries. |
GPU |
Documentation accessibility |
Documentation improvement to add clearer examples (maybe with some visualizations) |
A more intuitive dtype system. |
less pickiness about what kind of sequence of things a function (array, list, tuple …) can accept (this may be a general Python issue, though.) |
Clearer documentation |
Add homogeneous transformations |
Stop unwrapping zero-dimensional arrays into scalars. |
GPU support |
Easier docs for multidimensional operations (from stack/roll to whatever) |
Matrix operations |
Parallelism in summations. (Original in Spanish: Paralelismo en sumatorias.) |
Multi-threading by default |
Maybe easier way to move data between library especially deep learning. |
Integration with units libraries |
Plotting data easily |
Better string array support and performance |
More examples of its use (particularly for my specific domain). |
Documentation |
Clearer opportunities to give back to the community |
np.nan for int arrays |
Column vectors as default as opposed to 1d vectors. |
Multithreaded einsum |
closer coordination with pandas |
Improvements in Documentation |
CUDA backend like jax |
As optimization libraries |
More tutorials and use cases for linear algebra |
Better array reps in jupyter ? |
Greater integration with Python. (Original in Spanish: Mayor integración con Python.) |
became a framework |
Nullable integers |
numpy.dot should work on arrays of shape (…,n) x (n,…). |
Speed |
Increased adoption by other frameworks of array_function protocol |
Smaller steering committee |
Type annotations. (But that’s coming, from what I hear!) |
More docs |
randomized linear algebra |
Documentation and Examples |
An API/set of hooks to allow functions like |
Faster FFTs. |
Comprehensive documentation |
I run into floating point rounding errors often, sometimes that cause large bugs. This seems to stem from np.loadtxt. |
F2py handle Fortran 2008 and be thread safe |
NEP 21 |
Integrate quaternions as a basic type. (Original in Spanish: Integrar los cuaterniones como un tipo básico.) |
beginner-friendly numpy is too comp. |
Easy but efficient parallelism (like Mathematica’s ParallelMap). |
Stochastic linear algebra; specifically the ability to find the determinant of a LinearOperator. This might fit better in SciPy than NumPy |
why zeros() uses shape, while randint() uses size? I always forget which is which |
Tools for simulations (Monte Carlo method, generation of random numbers). (Original in Spanish: Herramientas para simulaciones (método Monte Carlo, generación de números aleatorios). ) |
Creation of a reliable binary format storage option. |
Faster masked arrays |
As a user, probably improved clarity/consistency of the documentation |
Increase the types of special functions. (Original in Japanese: 特殊関数の種類を増やします.) |
Static typing |
Better native GPU integration |
faster small arrays |
changing numpy’s name to np!!! |
Better documentation of packages. |
cleaner separation with scipy of FT - 2 equivalent modules with differences ! |
autodiff [but I am not sure I’d want it in numpy!] |
copy vs original array specifications |
GPU backend :) |
Fixing masked array and making them “first-class citizens”. |
Something Community-wise, I’m sure. |
new nditer C API support in Cython |
Support for hardware accelerators |
Performance comparable to rust ndarray (in my experience when used correctly Rust ndarray is ~2x faster) |
Add an HOWTO example to documentation on how to extend Numpy with a gufunc written in Cython |
this is a big ask, but it would be nice if numpy could run on gpus |
processing speed. (Original in Japanese: 処理速度.) |
A document describing best practices for using NumPy for scientific computing, targeted to a researcher using the package with limited computer science knowledge. |
Efficiency |
integer calculations like prime factorization |
how ndarray is displayed as a 2D list not matrix |
Adding the option to use more functions as methods. For many operations (sum, max, argmax, real/imag…) we have the option to access them as functions or methods. Many others (abs, angle, diff, sin, cos…) can only be used as functions. Sometimes it would be cleaner to write code with these as methods. |
Some statistical or linear algebra functions are both in scipy and numpy, this is confusing for me. |
Defining a numpy array with a string index |
optional parallel computing |
even better integration with other toolkits |
It’s hard to say. Numpy is probably my favorite library ever. If I was forced, maybe a course developed for both new and intermediate users. |
more robust documentation |
Improved performance. (Original in Japanese: パフォーマンスの向上.) |
GPU usage |
more examples in the wild. This is largely out of numpy’s control though. |
ability to run on GPU. |
More tutorials |
Make numpy.unique() handle objects arrays containing None again - this was working in Py2. |
anything to help bring people update with confidence - the BDFL for my project still uses py27, and it’s been pulling teeth to get him to 1.14. Even though I use 1.16 and 1.18 depending on the project, he is nervous because he remembers ‘that one time numpy changed how views are handled’ |
Providing the features that would stop PyTorch, TensorFlow, JAX, etc. from reimplementing NumPy and fragmenting the ecosystem. I think this is only CPU/GPU transparency (i.e. absorb CuPy). We don’t want to go back to the days of Numeric vs NumArray! |
Tutorials and more examples in documentation |
Improved seo that puts the latest docs at the top of searches. I frequently will end up with links to 1.15 docs |
portability |
Custom dtypes |
linear interpolation along an axis |
Be able to transform a 1D horizontal array to a 2D vertical array with the ‘.T’ transform. |
Better documentation. Examples. And explanation of underlying logic. It’s already good. But always could be better |
Speed and ease of use. (Original in Spanish: Velocidad y facilidad de uso.) |
More readable documentation would be welcome but otherwise NumPy is awesome! |
np.unique should accept a tolerance keyword that treats floats as the same if they differ by less than the tolerance. |
best practice and performance comparison of optimal/sub-optimal usages, and tutorial/documentation in this direction |
Documentation (tutorials) |
This might not be possible but having a fast way to iterate through arrays in a python for loop would make some operations easier. |
I can proudly say all the improvements I want to see (in things like docs) would be large, no low hanging fruit. |
Manipulation of ndarrays (indexing into, reshaping, etc.) could stand to be a little more transparent. |
more error messages for debugging |
I usually switch away from Numpy when my arrays contain strings. Perhaps there is a better way? |
Mentorship (stronger involvement in NumPy). Some less used features are completely unknown to me and it is hard to find tutorials/materials on them besides the documentation. |
Multithreaded functions |
Better documentation |
It would be really nice to have an api from numpy that evaluated the performance costs/benifits between different function calls with some input data, (like np.mat vs np.array, or np.dot vs np.einsum). It would make it easier to compare and see what I should be using in a specific case |
More functionalities for images 2d and 3d |
Names dimensions |
Low level explainations |
more documentation for advanced users for maximum performance |
I would like an explicit pointer syntax |
Static type hints |
Performance |
Ragged arrays/dtypes |
Working with JAX to add the numpy protocols. Then I can really use either library however I want! |
N-D linear interp |
Adding a “a.b” notation for dot products |
low-level parallel computing |
FASTER |
Clear and concise concatenation of 1D arrays to form a shape (N, 2) array. Currently using `np.vstack((…)).T’. |
Some finances module, but other than that is awesome as it is now |
Better tutorials and or easier way to create ufuncs |
rational number support with arbitrary capacity (int8, int16, etc). Need this for chemical stoichiometry calculations, specifically for calculating nullspace of stoichiometry matrix. |
Make the API reference less ad-hoc. See the Java docs for the ideal model. |
Faster multi-threaded operations (but this is out of scope and I’m happy using other libraries) |
[honestly it’s perfect] |
labelled arrays |
CUDA integration… |
A place for writing and submitting tutorials on how to implement things in numpy, and ways to link numpy functions to these tutorials. |
Support to visualize data (matplotlib often too complicated) |
GPU usage |
optimization |
Parallelization features |
Better documentation of linear algebra wrappers |
JIT |
separate the C code from the python code: less extensive use of the CPython C-API |
More visualization tools |
Support for type annotations |
NEP-35 and NEP-37 widespread adoption |
.index() … I’ve been seriously considering dropping numpy entirely in favour of pytorch over this, and frankly given how long it’s been I think it might be prudent to do so even if numpy added .index() today. |
More and better examples of using Numpy with more realistic data. (Original in Spanish: Más y mejores ejemplos de uso de Numpy con datos más realistas.) |
Alternatives to very large arrays (memory error). (Original in Spanish: Alternativas a arrays muuuy grandes (memory error).) |
Contract Simplification (mainly the sugar side of things) |
Weighted quantiles. I’m working on it |
Packaging of mkl libraries other than conda (wheel). (Original in Japanese: conda以外のmklライブラリのパッケージ化(wheel).) |
CUDA |
GPU support |
Better modern Fortran support in f2py |
A more user friendly vector class for linear algebra |
Synchronization between numpy.linalg and scipy.linalg. |
Consistent null value handling bumpy array |
Easier to understand documentation |
Better performance (paralelization) |
(py)FFTW backend |
Updated documentation for f2py |
A more consistent API, perhaps? (Original in Portuguese: Uma API mais consistente, talvez?) |
Add a way to keep track of units and to display answers with units |
documentation |
Documentation |
Usability. Make it simpler to use |
More speed ;) |
Performance boosts using inherent parallelism. |
Have a better documentation and tutorials. |
Better examples on doc pages. Almost always I have to check stackoverflow to understand the function better. |
Better control of array memory. |
Language-independent API |
Performant vectorisation |
A clarification in the function documentation to quickly know if it works in view or in copy. (Original in French: Une clarification dans la documentation des fonctions pour savoir rapidement si elle travaille en vue ou en copie.) |
Easy Documentation.. New learning is difficult with the current documentation model |
An easier way to handle arrays larger than memory |
better documentation, with more examples and use cases. |
Give more examples along with the documentation, give use cases, redesign docs page |
More integration with numba jit & cuda |
Better tutorial/documentation on how to efficiently use numpy features (ufunc etc.) |
More extensive and tutorial like documentation like stack overflow is with a continuous example |
support NA/missing values |
Increased random support. (Original in Spanish: Mayor soporte de random.) |
Why do you speak in feminine? (Original in Spanish: Por que habláis en femenino?) |
Codifying a “minimal NumPy” |
Would love a feature to extract both the min and max of an array (with an optional axis parameter) in one stride |
GPU |
Multithreaded 2 and 3 dimensional FFTs |
Adding the feature I requested |
Making faster. Python is inefficient and Numpy does not help by default. |
Better alternative for SWIG to wrap a proprietary I/O library written in C++ |
I think your masked array implementation is kind of clunky. The relationship between the mask and the underlying data array can get confusing. In particular, the behavior of the fill value is confusing. Setting something to the fill value in the data array doesn’t change the mask. Changing the mask doesn’t seem to update the data array. It’s been a while since I’ve had to deal with this issue, but it can get confusing. |
Allowing users to perform operations with one dimension removed. Eg adding a matrix of (3,4) to a vector of shape (3,) |
I would like documentation in Spanish in the most complex areas. (Original in Spanish: Me gustaria documentacion en español en las areas mas complejas.) |
Clearer separation between numpy and scipy in overlapping domains (linalg comes to mind) |
In-built visualization support for NumPy arrays. Would make it easier to visualise high dimensional arrays. |
allowing to slice an array with another array |
numpy <—> netCDF examples. I know how to do it, but “exchange” between formats would be better documented |
more hand-on with simple level 100 to 500 |
Way to access specific parts of the library since putting numpy in production is heavy. (Original in Spanish: Manera de acceder a partes específicas de la librería ya que poner numpy en produccion es pesado.) |
Improve performance |
ONNX support |
Became more Developer friendly |
GPU acceleration |
Other Significant Changes#
Finally, we asked participants to share any other changes that would significantly improve NumPy. The responses of the 110 participants who answered this question are listed below.
Show code cell source
gen_mdlist(other_changes, "other_changes_list.md")
Expand to see responses!
Comments |
---|
adaptability |
Documentation in Spanish |
I do not understand why np.random.rand(10,10) is okay, but np.ones/zeros(10,10) is not. |
Direct GPU support, without using other packages like nuba |
kill pandas |
Javascript API AND GPU support |
GPU support. Other languages support (Rust, PHP, Javascript) |
built in parallelization over machines |
Restructure and simplify the documentation |
Reduction on the size, splitting parts of niche modules into stand alone projects |
Type annotations, they help a lot to streamline development in supporting IDEs |
GPU adaption, optimisation tools |
Add static types |
I like this idea of a mentorship program to contribute :) |
Ability to run in low computing resource environments. (Original in Spanish: Capacidad de correr en entornos de bajos recursos computacionales.) |
Memory mapped numpy array and support B for custom hardware |
Capacity improvements and interpolation. (Original in Spanish: Mejoras en la capacidad e interpolación.) |
Non pickle serialization |
More communication on quality and usage |
user gufuncs |
Tutorials to improve performance without being an expert C or Fortran user. (Original in Spanish: Tutoriales para mejorar rendimiento sin ser usuario experto de C o Fortran.) |
Although this is probably not too feasible, some limited ability to write “non-vectorised” multi-line maths/logic in simple loop-like structures (e.g. where perhaps tools like Numba might be overkill) and have it optimise the loop overhead (e.g. like some sort of local static typing) would be quite nice to see. |
make loadtxt and savetxt more symmetrical and flexible |
Automatic differentiation |
Cython interaction is sometimes awkward (should I use Cython memoryviews or an ndarray?) |
Tighter integration / support for numpy extensions like numba and cupy |
From my point of view, numpy is most important as the base on which other scientific packages are built. The ones I use most are scipy.stats, astropy, and pandas. In that respect, I don’t see a need for significant changes in numoy |
More speed ;) |
Implicit support for specialized compute hardware (GPU) |
A syntax more similar to that of R. (Original in Portuguese: Uma sintaxe mais similar a do R.) |
Nothing: NumPy is excellent! |
Document the errors |
clearer documentation of every array manipulations |
publish a definitive set of APIs |
Real compatibility with PyPy |
Better documentation |
I love numpy, thanks for all your hard work! |
GPU support would speed up some computations on large arrays. |
Faster imports, less reliance on conda-forge |
More widespread adoption in universities. |
MICROPYTHON |
array_module |
Portability and independent |
Support for cuda or OpenCL technologies should be more transparent. (Original in Spanish: El soporte para tecnologías tipo cuda u OpencL debería ser más transparente.) |
What’s the deal with numpy.matrix? |
more elaborate examples |
Support for Mypy |
and parallel and distributed capabilities, not sure if they already available. |
*beginner-friendly numpy is too comp. |
Better type system similar to that of Julia |
Soliciting materials for tutorials from well versed users |
Better IDE Autocomplete |
outer products for vectors are still quite confusing |
Expanding the number of interpolation methods. (Расширение количества методов интерполяции |
Maybe some graphics in the website to clarify multidimensional operations |
Something like Google’s JAX, with differential functionality and better speed. |
more transparent numba (or numba like) integration |
clear up the confusing situation with matrices and arrays or at least explain it more thoroughly |
Differential equations |
Just support for type annotations |
Cleaner C++ interface |
Data types of missing values other than numeric types. (Original in Japanese: 数値型以外の欠損値のデータ型.) |
Adding autodiff, enabling named tensor axes (even if not “labelled”, at least “named”, to keep tensor dimensions’ semantic meaning straight) |
GPU support |
Modern, more readable Sphinx theme. Deprecate unused or out of scope parts of the library. |
I know you’ll never do it, but the matmul operator is nearly useless. I have no control as to whether the user passes in a matrix or scalar, and in many instances both are entire correctly. So my code is still only using dot. I’ve read the arguments of ‘only one way’ and it’s bogus. We don’t have different operators to add int int vs float float, nor should we. So I have dot(F, P).dot(Q) everywhere instead of F@P@Q. It is so hard to read, and prone to error (F.dot(P) can fail if F is not a matrix). I initially went through a bunch of code replacing dot with @ but had to get rid of them all because I was getting endless exceptions. This is sw to do math - please let us express it in a reasonable manner! And make these comment boxes bigger! A one line box to enter a ‘significant’ change? |
Automatic differentiation |
More understanding of low level functions |
Performance tests related to the different ways of programming with Numpy. (Original in Spanish: Pruebas de rendimiento relacionadas con las distintas formas de programar con Numpy.) |
view ndarray as matrix like in matlab or R |
distributed computing |
sincos implementation, exp(1j * x) implementation, vectorized transcendentals (e.g., using Intel SVML/MKL for exp, sin, etc) |
enhancements to object arrays |
Custom FFT kernels |
Better coordination of array-like types throughout the scientific ecosystem |
Native numexpr-ike functionality |
Hpy api for integration with pypy |
Optional auto-parallelization |
Arbitrary precision and the addition of physical constants (although that’s just for convenience). |
Guided examples |
modularisation |
Emphasize on performance! And keep up good work - Numpy is great. |
Interfaces a otros lenguajes |
Combine methods/algorithms under single module to reduce imports (and googling). |
It occasionally breaks or installs incorrectly on windows and with VSCode. Numpy experts might do more to smooth over this issue |
Cleaner documentation with more cross linking |
Progress the masked array topic would be great, there were ideas on a replacement, not sure what the solution is but a more versatile and efficient solution would be awesome. |
GPU computing |
More cohesion? - “one right way to do things” (scalars), actually removing np.matrix, better support for @, etc. |
Make the mean and standard deviation functions account for nan values like the nanmean functions. |
Better integration with charts and big data. (Original in Spanish: Mejor integración con gráficos y grandes volúmenes de datos.) |
Performance efficiency |
GPU support |
more integration with pandas, although this one is the superior one. |
Julia broadcasting syntax would be AWESOME! No ideas how to make happen. |
Having a wider community (both in numbers and in the diversity of its members) |
Support for ragged arrays |
Just better documentation |
Physical units management. |
Even better interoperability with other array libraries. |
GPU support |
Full support to type-hinting |
nothing, you all are doing great with this project, thanks for your awesome work |
np.nan for int arrays |
Computation speed |
rewrite the project in cpp instead of c and drop fortran |
High-level APIs |
Improved documentation on concepts and theory behind NumPy |
Vizualization of NumPy documentation by using “tree” of modules and functions. Also it would be usefull if new branches of the “tree” I didn’t see yet, would be marked with some color |
More functions designed for sparse arrays |