Hubble Abstracts

Tracking Hubble Proposals

The Hubble Space Telescope (HST) is one of the best tools that we have in astronomy. 540 kilometers above the Earth, Hubble is above the atmosphere and gets diffraction limited observations of a wide range of objects. Each cycle (usually once a year), hundreds of astronomers routinely ask for four to five times the amount of HST time and money as is available. HST was launched and is run by NASA, but the science operations are managed by Space Telescope Science Institute. All the approved programs are listed on the data archive. For this analysis I downloaded all the abstracts and observing time associated with each cycle. Below we see the total number of programs per cycle. Click the buttons below to toggle between total and normalized representation.

The total number spiked in cycle 7 before settling to a fairly consitent value and then increasing slowly over the last few years. The spike in cycle 7 is simply because that cycle was about twice as long as all other cycles (this is due to the failure in the NICMOS instrument).

As for the percentage of proposals of different types, we see that after HST was launched there were far more "GTO" projects. These are Guaranteed Time Observer programs which are given to the scientists who build the cameras and sensors on Hubble. After several years these programs are phased out, but new instruments have been added several time, which show up as small increases in the GTO breakdown.

Although the "Cal" projects (routine calibration proposals that help keep HST running smoothly) have remained mostly consistent over the years, the "misc" projects spike as new instruments are added. These miscellaneous projects are mostly commissioning work for the new instruments.

Finally, we see a growth in archival projects after a few cycles, because data is only available for archival research roughly a year after the initial observations.

I checked the MAST archive for all data related to every project. I then summed the exposure time of every project in all cycles. This gives some sense of how much total telescope time was available during each cycle. This same large spike in observing time associated with cycle 7.

Again, this spike is simply due to the length of Cycle 7 -- GHRS failed in cycle 6 and NICMOS failed in cycle 7 (see here for more) which extended the cycle. Other than that cycle, the exposure time has stayed around the average of 18 million seconds per year (for reference, a year has 31.5 million seconds).

Inequality of Hubble Time

Let's turn now to the number of proposals per person. There are 2484 different PIs listed in the Hubble archive. I counted the number of successful proposals for each PI and found that most people have one, then two and so forth. And note here that we are only counting people who have successful proposals. In reality, the vast majority of proposers likely have no successfull proposals. Below you can see a histogram of the number of people with different numbers of proposals. Note the large tails.

A small percentage of people have the majority of all the successful proposals. This can also be seen if we look at the cumulative distribution. The bottom 2000 PIs have 34% of all proposals, thats the same percentage as the top 111 PIs! One way to quantify this is the Gini coefficient. The Gini coefficient is a popular measure of income inequality and is also used in astronomy to measure how spread out the light is from galaxies. The Gini coefficient for Hubble proposals is 0.6. This is roughly equal to some of the most unequal countries in the world and much larger than the US's income inequality of 0.4.

Who are these top proposers?

John W. MacKenty
Total abstracts: 106
CAL/WFC: 43, RPT/WFC: 1, SNAP: 2, GO: 7, GO/PAR: 3, CAL/NIC: 2, SM2/NIC: 12, CAL/WFC3: 34, SM4/WFC3: 1, GO/DD: 1
Edmund Nelan
Total abstracts: 98
CAL/AST: 53, SM3/FGS: 13, SM3/SC: 1, SNAP: 3, CAL/OTA: 9, GO/DD: 3, GO: 9, ENG/FGS: 3, ENG/AST: 3, SM4/FGS: 1
Keith S. Noll
Total abstracts: 88
GO/DD: 49, GO: 16, GO/CAR: 1, CAL/NIC: 4, SNAP: 6, SM3/NIC: 3, SM4/ERO: 9
John A. Biretta
Total abstracts: 86
GO: 19, CAL/WFC: 2, SM2/WF2: 8, GO/PAR: 1, CAL/WF2: 36, CAL/ACS: 2, ENG/ACS: 4, ENG/WF2: 3, CAL/WFC3: 6, CAL/STIS: 5
John Trauger
Total abstracts: 69
SMC/ERO: 1, GTO/WF2: 60, SMC/WF2: 5, SME/WF2: 1, GO: 1, GO/DD: 1

If we look at the five PIs with the most total abstracts we see that most of these people have lots of abstracts related to calibrations and are associated with NASA and Space Telescope Science Institute.

If we just focus on the GO proposals we have the most awarded proposers:

Howard E. Bond
Total abstracts: 52

Robert P. Kirshner
Total abstracts: 35

Julianne Dalcanton
Total abstracts: 31

Thomas R. Ayres
Total abstracts: 29

Marc Postman
Total abstracts: 27

Finally, what makes a good proposal? Proposers routinely ask for five times as much Hubble time as is available. With that in mind, there are many small things that proposers need to do to get their proposal accepted. One of these is to have a very novel idea that will help push the field forward. To look for this novelty I wanted to see word use in the abstracts over the cycles. TF-IDF, or term frequency- inverse document frequency, can help us with this. TF-IDF will show you the most common words in a certain document (in this case a document was all GO proposals in a single cycle), but down-weight the words that are very common across all documents (in this case all proposals from all cycles). The result is that the relatively more common words in each cycle will show up, while the very common words across all proposals will not show up.

To be clear here, since we are mixing together all the proposals, a word may have a high term frequency either because it is used heavily in any one proposal or because is was used often across many proposals in that cycle.

In the table below, each row shows the eight words with the highest TF-IDF scores. We can see some interesting trends stand out. The first cycle included the word 'hoped' a lot more than the later cycles - clearly the capabilities of Hubble hadn't been demonstrated yet.

Words like Exoplanet become very common around cycle 20. Many other words are specific instruments (like wfc3) or the date that they are applying in. You can click on the words below to see all corresponding abstracts in the MAST archive.

Cycle 1	ag	discrimination	287	hoped	2251	esc	wf	hrs
Cycle 2	registration	r0	mination	rigid	egos	deter	esc	0005
Cycle 3	spectropolarimetric	serkowski	triton	ag	wf	venus	lpvs	n88a
Cycle 4	deter	rigid	lsa	4c41	esc	wf	1993j	0005
Cycle 5	chiron	triton	maunder	wfpc	fliers	phobos	kms	lyalpha
Cycle 6	3cr	ppne	so_2	inthe	db	kms	ofthe	lyalpha
Cycle 7	helical	triton	cavity	jvas	o_2	lyalpha	nic2	1996
Cycle 8	iso	axaf	m_odot	hdf	triton	lvc	fliers	lyalpha
Cycle 9	diffusion	sbs	venus	lyalpha	3he	593	macho	1543
Cycle 10	macho	blg	kms	em	mbh	lyalpha	eq	lbcgs
Cycle 11	fr	m_odot	wim	blg	2001	sbs	macho	ppns
Cycle 12	krypton	rls	sirtf	scuba	11b	lb	ceres	hood
Cycle 13	qlmxbs	pagb	protocluster	westerlund	lirgs	js	pox	fgs1r
Cycle 14	4449	bhxn	sscs	m32	2m1207	lsb	bss	egp
Cycle 15	lmxbs	scl	nic2	afterglows	fgs1r	ulirgs	irac	hook
Cycle 16	hn	irac	imbh	sings	lyr	auroras	fg	lbgs
Cycle 17	ppne	enceladus	polaris	hypervelocity	uvis	lbgs	cos	wfc3
Cycle 18	1796s	nyquist	adds	uvis	f275w	cos	f475w	wfc3
Cycle 19	4042s	adds	constancy	uvis	f275w	f475w	cos	wfc3
Cycle 20	conduction	exoplanets	wise	laes	2011	uvis	cos	wfc3
Cycle 21	uvis	initiative	igrm	alma	dm	lyc	cos	wfc3
Cycle 22	shield	2012	slsn	2013	2014	cgm	cos	wfc3
Cycle 23	2014	alma	uvis	exoplanets	cgm	2015	cos	wfc3
Cycle 24	exoplanet	exoplanets	juno	2017	2016	udgs	cos	wfc3
Cycle 25	exoplanets	lyc	uvis	exoplanet	cgm	udgs	cos	wfc3

Another way of looking at this change of word usage is to examine NGrams. We can see some trends below like above, for example exoplanets becomes more common over time. This time I am looking only at the appearance of the word in each abstract. So a word is only counted once for each abstract in a cycle that it appears in. If you want to, you can play around with the data yourself, just like the Google NGram search for books.