Usage#

Hide code cell source
fname = "data/2021/numpy_survey_results.tsv"
column_names = [
    'version', 'version_other', 'primary_use','use_freq','components','use_c_ext',
    'using_random', 'bug', 'bug_resolution', 'bug_resolution_other',
    'unsolvable', 'unsolvable_resolution', 'unsolvable_resolution_other', 
    'issues','issues_other',
    'deprecation_short', 'deprecation','deprecation_other'
]
featdep_dtype = np.dtype({
    "names": column_names,
    "formats": ['U1024'] * len(column_names),
})

data = np.loadtxt(
    fname, delimiter='\t', skiprows=3, dtype=featdep_dtype,
    usecols=[22,23,24,28,29,30,73,74,75,76,77,78,79,80,81,82,83,84], comments=None, encoding='UTF-16'
)

This section comprises various questions to try to gain insight on things like new feature adoption, issue resolution, and the length of deprecation cycles.

NumPy Primary Use#

418 (80%) respondents provided information about the primary context in which they use NumPy. Almost 3/4 of respondents use NumPy for work.

Hide code cell source
uses = data['primary_use'][data['primary_use'] != '']
labels, cnts = np.unique(uses, return_counts=True)

fig, ax = plt.subplots(figsize=(12, 8))
ax.pie(cnts, labels=labels, autopct='%1.1f%%')
ax.set_title("NumPy Primary Use");
fig.tight_layout()

glue(
    '2021_num_primary_use_respondents',
    gluval(uses.shape[0], data.shape[0]),
    display=False
)
../../_images/96faa4e2c00b35c48404317e0631f51431d534b856bd2483c6f44050d1684b04.png

NumPy Frequency of Use#

418 (80%) respondents provided information about how often they use NumPy. Most respondents use NumPy on a daily or weekly basis.

Hide code cell source
use_freq = data['use_freq'][data['use_freq'] != '']
labels, cnts = np.unique(use_freq, return_counts=True)

fig, ax = plt.subplots(figsize=(12, 8))
ax.pie(cnts, labels=labels, autopct='%1.1f%%')
ax.set_title("NumPy Frequency of Use");
fig.tight_layout()

glue('2021_num_freq_respondents', gluval(use_freq.shape[0], data.shape[0]), display=False)
../../_images/876378ed9e0d9e510a55d5c2da8c2cad9ee8c0346839144f11caf9b6d4cef67b.png

NumPy Version#

NumPy 1.21 was the latest stable release at the time the survey was conducted. 50.7 percent of respondents report that they primarily use an older version of NumPy.

Hide code cell source
vers = data['version'][data['version'] != '']
labels, cnts = np.unique(vers, return_counts=True)

fig, ax = plt.subplots(figsize=(12, 8))
ax.pie(cnts, labels=labels, autopct='%1.1f%%')
ax.set_title("NumPy Version");
fig.tight_layout()

# Percentage of users that use older versions
older_version_usage = 100 * cnts[-8:-2].sum() / cnts.sum()
glue('2021_older_version_usage', f"{older_version_usage:1.1f}", display=False)
../../_images/c3c1d0e7c0a6ebd31942b0004b906369c538747bd216717cb5a84734f63cfebb.png

NumPy Components#

NumPy encompasses many packages for specific scientific computing tasks, such as random number generation or Fourier analysis. The following figure shows what percentage of respondents reported using each NumPy subpackage.

Hide code cell source
components = data['components'][data['components'] != '']
num_respondents = len(components)
# Process components field
all_components = []
for row in components:
    all_components.extend(row.split(','))
all_components = np.array(all_components)
labels, cnts = np.unique(all_components, return_counts=True)
# Descending order
I = np.argsort(cnts)
labels, cnts = labels[I], cnts[I]
cnts = 100 * cnts / num_respondents

fig, ax = plt.subplots(figsize=(12, 8))
ax.barh(np.arange(len(cnts)), cnts, align='center')
ax.set_yticks(np.arange(len(cnts)))
ax.set_yticklabels(labels)
ax.set_xlabel("Percentage of Respondents")
ax.set_title("Use-Frequency of NumPy Sub-Packages")
fig.tight_layout()
../../_images/6f95e11be0a9d154c1bab491f533b804fdcf583c5b2e3e336cf9ca4d0ac9d7dd.png

NumPy C-Extensions#

363 (70%) participants shared whether they (or their organization) uses custom C-extensions via the NumPy C-API (excluding Cython). Only about 7% of respondents report use.

Hide code cell source
uses_c_ext = data['use_c_ext']
use_c_ext = data['use_c_ext'][data['use_c_ext'] != '']
labels, cnts = np.unique(uses_c_ext, return_counts=True)
labels[0] = 'No response'

fig, ax = plt.subplots(figsize=(8, 8))
ax.pie(cnts, labels=labels, autopct='%1.1f%%')
ax.set_title("Use of NumPy C-Extentions");
fig.tight_layout()

glue('2021_num_c_ext', gluval(use_c_ext.shape[0], data.shape[0]), display=False)
../../_images/f015c229b004bcce5280d8bc4f021b0c77c3cb37fa4faac55de93e4d0c283238.png

New numpy.random Adoption#

A new API for random number generation was added to numpy.random in version 1.17. We asked survey paricipants whether they were using the new random API. Of the 522 survey participants, 233 (45%) shared whether they were using the new random API.

Hide code cell source
rand = data['using_random'][data['using_random'] != '']
labels, cnts = np.unique(rand, return_counts=True)

fig, ax = plt.subplots(figsize=(8, 8))
ax.pie(cnts, labels=labels, autopct='%1.1f%%')
ax.set_title("Use of Random API");
fig.tight_layout()

glue(
    '2021_num_random_users',
    gluval(rand.shape[0], data.shape[0]),
    display=False
)
../../_images/7fc3198822cc9c20564f3089c08eacaa282c92dbf5e6ea28a535623892703449.png

Handling Issues#

We wanted to get a sense of how often users experience issues with NumPy, so we asked the following question:

In the last year, have you experienced problems in code you’ve written stemming from a problem in NumPy?

Of the 522 survey participants, 331 (63%) responded to this question.

Hide code cell source
bug = data['bug'][data['bug'] != '']
labels, cnts = np.unique(bug, return_counts=True)

fig, ax = plt.subplots(figsize=(8, 8))
ax.pie(cnts, labels=labels, autopct='%1.1f%%', labeldistance=None)
ax.legend()
ax.set_title("Experienced NumPy Issues");
fig.tight_layout()

glue(
    '2021_bug_reporters',
    gluval(bug.shape[0], data.shape[0]),
    display=False,
)
../../_images/51027931a4861521e61bd2c98b5253edd03c33d9d75bced13b320c824110378d.png

We asked those who reported they experienced issues what action(s) they took to resolve the issue.

Hide code cell source
bug_resolution = data['bug_resolution'][data['bug_resolution'] != '']
labels, cnts = np.unique(flatten(bug_resolution), return_counts=True)
I = np.argsort(cnts)
labels, cnts = labels[I], cnts[I]

fig, ax = plt.subplots(figsize=(12, 8))
ax.barh(
    np.arange(len(labels)),
    100 * cnts / bug_resolution.shape[0], 
    tick_label=labels,
)
ax.set_xlabel('Percentage of Respondents')
ax.set_title("Actions to Resolve NumPy Issue");
fig.tight_layout()
../../_images/a0df339cdcf4c613a157f0a90741136e91b9734b6c07acc015351edd00b2f670.png

Data Analysis with NumPy#

Similar to the the previous question, we tried to get a sense of how well NumPy meets users’ data analysis needs. We asked the following question:

In the last year, have you encountered a problem involving numerical data that you were unable to solve using NumPy?

Of the 522 survey participants, 328 (63%) responded to the above question, with 60 (18%) reporting that they’ve had a problem that they initially expected to be able to solve using NumPy, but were unable to do so.

Hide code cell source
unsolvable = data['unsolvable'][data['unsolvable'] != '']
labels, cnts = np.unique(unsolvable, return_counts=True)
num_yes = np.sum(unsolvable == 'Yes')

fig, ax = plt.subplots(figsize=(8, 8))
ax.pie(cnts, labels=labels, autopct='%1.1f%%')
ax.set_title("Experienced Data Analysis Issues");
fig.tight_layout()

glue(
    '2021_num_solvers',
    gluval(unsolvable.shape[0], data.shape[0]),
    display=False,
)
glue(
    '2021_num_unsolved',
    gluval(num_yes, unsolvable.shape[0]),
    display=False
)
../../_images/5b731efffe5cd52077c3e5e979cf6fe046baad3c4318c0e54f5901231e894ff9.png

We asked those that responded “Yes” to the previous question what action(s) they took to resolve the issue.

Hide code cell source
resolution = data['unsolvable_resolution'][data['unsolvable'] == 'Yes']
resolution = resolution[resolution != '']
labels, cnts = np.unique(flatten(resolution), return_counts=True)
I = np.argsort(cnts)
labels, cnts = labels[I], cnts[I]

fig, ax = plt.subplots(figsize=(12, 8))
ax.barh(
    np.arange(len(labels)),
    100 * cnts / resolution.shape[0], 
    tick_label=labels,
)
ax.set_xlabel('Percentage of Respondents')
ax.set_title("Actions to Resolve Data Analysis Issue");
fig.tight_layout()
../../_images/516d0d62b0d7a6b170bf5c089289390704f48be4921ac8a1d3de5ece8782dc47.png

Opening Issues#

54 (10%) respondents reported having a problem with numerical data that they were unable to solve using NumPy, and did not open an issue. They were then asked why they did not open an issue for their particular problem.

Hide code cell source
open_issues = data['issues'][data['issues'] != '']
labels, cnts = np.unique(flatten(open_issues), return_counts=True)
I = np.argsort(cnts)
labels, cnts = labels[I], cnts[I]

fig, ax = plt.subplots(figsize=(12, 8))
ax.barh(
    np.arange(len(labels)),
    100 * cnts / open_issues.shape[0], 
    tick_label=labels,
)
ax.set_xlabel('Percentage of Respondents')
ax.set_title("Reason for not opening issue");
fig.tight_layout()

glue(
    '2021_num_open_issues',
    gluval(open_issues.shape[0], data.shape[0]),
    display=False,
)
../../_images/b796abff429977bdb619c2abe889c5931c872bea8fbc754853abc278a8336c2c.png

Deprecation Timeframe#

We asked survey participants to share their opinion on the NumPy deprecation cycle, specifically:

NumPy normally has a two release cycle (1 year) deprecation policy. Do you think this is…

Of the 522 survey participants, 322 (62%) responded to this question.

Hide code cell source
current_dep = data['deprecation_short'][data['deprecation_short'] != '']
labels, cnts = np.unique(current_dep, return_counts=True)

fig, ax = plt.subplots(figsize=(8, 8))
ax.pie(cnts, labels=labels, autopct='%1.1f%%')
ax.set_title("Viewpoint on NumPy Deprecation Timeframe");
fig.tight_layout()

glue(
    '2021_num_dep_short',
    gluval(current_dep.shape[0], data.shape[0]),
    display=False
)
../../_images/08c32e70d9aa45a1bc86b7c3ffa773e1f42b4243c7acb96cf218509355e04621.png

We also asked the following:

What do you consider as a good deprecation time frame?

Of the 522 survey participants, 322 (62%) responded to this question.

Hide code cell source
depcycle = data['deprecation'][data['deprecation'] != '']
labels, cnts = np.unique(depcycle, return_counts=True)

fig, ax = plt.subplots(figsize=(8, 8))
ax.pie(cnts, labels=labels, autopct='%1.1f%%')
ax.set_title("Ideal Deprecation Timeframe");
fig.tight_layout()

glue(
    '2021_dep_opinions',
    gluval(depcycle.shape[0], data.shape[0]),
    display=False
)
../../_images/2547fbd27362b8dd40f8103f31c54d10372be4e95137122e1fc7957cf5213ca2.png