UV luminosity density results at z > 8 from the first JWST/NIRCam fields limitations of early data sets and the need for spectroscopy

We hav e deriv ed luminosity functions and set constraints on the UV luminosity and SFR density from z ∼ 17 to z ∼ 8, using the three most-studied JWST/NIRCam data sets, the SMACS0723, GLASS Parallel, and CEERS ﬁelds. We ﬁrst used our own selections on two independent reductions of these data sets using the latest calibrations. A total of 18 z ∼ 8, 12 z ∼ 10, 5 z ∼ 13, and 1 z ∼ 17 candidate galaxies are identiﬁed o v er these ﬁelds in our primary reductions, with a similar number of candidates in our secondary reductions. We then use these two reductions, applying a quantitative discriminator, to segregate the full set of z ≥ 8 candidates reported o v er these ﬁelds from the literature, into three different samples, ‘robust’, ‘solid’, and ‘possible’. Using all of these samples, we then derive UV LF and luminosity density results at z ≥ 8, ﬁnding substantial differences. For example, including the full set of ‘solid’ and ‘possible’ z ≥ 12 candidates from the literature, we ﬁnd UV luminosity densities, which are ∼ 7 × and ∼ 20 × higher than relying on the ‘robust’ candidates alone. These results indicate the evolution of the UV LF and luminosity densities at z ≥ 8 is still extremely uncertain, emphasizing the need for spectroscopy and deeper NIRCam + optical imaging to obtain reliable results. Ne vertheless, e ven with the very conserv ati ve ‘robust’ approach to selections, both from our own and those of other studies, we ﬁnd the luminosity density from luminous ( M UV < − 19) galaxies to be ∼ 2 × larger than is easily achie v able using constant star formation ef ﬁcienc y models, similar to what other early JWST results hav e suggested.

These early studies have have returned very luminous and seemingly robust sources at ∼ 10-13 (e.g. Naidu et al. 2022b;Finkelstein et al. 2022a;Bouwens et al. 2022a), and have also identified candidates out to redshifts as high as ∼ 16-20 (e.g. Naidu et al. 2022a;Zavala et al. 2023;Yan et al. 2023;). At the same time, some very massive sources have been identified at ≥ 7 on the basis of what appear to be substantial Balmer breaks , suggesting substantial early mass assembly in the universe. These results are enigmatic, however, potentially exceeding the available baryons to form stars at ≥ 13 (Boylan-Kolchin 2022; Naidu et al. 2022a, but see also Steinhardt et al. 2022;Inayoshi et al. 2022;. Despite this flurry of new high-redshift sources, there have been substantial differences in the ≥ 8 candidate galaxy samples identified in different studies over the same fields. The typical overlap between candidate lists in the earliest analyses were only ∼10-20% (at least in the initial versions of these papers). 1 Broadly such differences can be indicative either of substantial contamination in ≥ 8 selections or high levels of incompleteness. 2 In either case, the inferred luminosity function results could be substantially mis-estimated (by ∼0.3-0.5 dex). Overall, this entire issue poses a major challenge as we try to understand what is really happening in the first 400-500 Myr at ≥ 10 where JWST can provide unique new insights into galaxy buildup.
The primary purpose of the present paper is to investigate the evolution of of the UV luminosity density and star formation rate density for galaxies at ≥ 8 from early JWST data sets, while looking closely at the overall range of constraints allowed based on current observations. Key to doing this in a quantitative way is to provide an assessment of the first selections of ≥ 8 candidate galaxies over the first JWST data sets and any updates to these selections that have become possible due to improvements e.g. from improved zeropoint calibrations (e.g Adams et al. 2023), and to identify approaches that lead to the most robust samples.
To this end, we will make use of two independent, recent reductions of the available JWST data that use the latest calibrations and experience in dealing with artifacts, over the three most well-studied fields, the SMACS0723 cluster (Pontoppidan et al. 2022), four NIR-Cam pointings from the Cosmic Evolution Early Release Science (CEERS) fields , and the NIRCam GLASS parallel field (Treu et al. 2022). Not only do select our own set of ≥ 8 candidates from these fields, but we also make an assessment of essentially all candidates from previous studies of these same fields, to gauge how well individual selections appear to be working and to characterize potential progress. In doing so, we present community LF results, showing the impact of including candidates of various quality on the LF and luminosity density results at > 8. We will also investigate the extent to which an emergent picture is forming on the basis of the latest results from the collective analyses.
The plan for this paper will be as follows. In §2, we summarize the data sets utilized in this paper and our procedure for performing photometry of sources in those data sets. In §3, we present our procedure for selecting ≥ 7 sources from the data sets we examine, the ∼ 8-17 samples we derive, and our ∼ 8, 10, 13, and 17 LF results, while performing a detailed assessment of other candidate ∼ 8-17 galaxies in the recent literature. In §4, we use those results to derive LF results based on our own selections and our commmunity samples, discuss the results in §5, and then provide a summary in §6. For convenience, the HST F435W, F475W, F606W, F814W, F098M, F125W, F140W, and F160W filters are written as 606 , 814 , 098 , 125 , 140 , and 160 , respectively, throughout this work. Also we quote results in terms of the approximate characteristic luminosity * =3 derived at ∼ 3 by Steidel et al. (1999), Reddy & Steidel (2009), and many other studies. A Chabrier (2003) initial mass function is assumed throughout. For ease of comparison to other recent extragalactic work, we assume a concordance cosmology with Ω = 0.3, Ω Λ = 0.7, and 0 = 70 km/s/Mpc throughout. All magnitude measurements are given using the AB magnitude system (Oke & Gunn 1983) unless otherwise specified. Table 1. Estimated 5 depth [in mag] of the three JWST fields (SMACS0723, CEERS, and Abell 2744 parallel) we use in searching for galaxies at, nominally, ∼ 8, ∼ 10, ∼ 13, and ∼ 17. A 0.35 -diameter aperture is adopted for the photometry in computing the depth in each field. These depths include a correction for the flux in point sources lying outside a 0.35"-diameter aperture and thus correspond to the total magnitudes of point sources that would be detected at 5 for a given band.

Data Sets
We make use of the three most studied NIRCam data sets in constructing a selection of ≥ 7 galaxies from current JWST observations, i.e., the ∼12-hour SMACS0723 cluster field featured in the JWST early release observations (Pontoppidan et al. 2022), the 4-pointing NIRCam observations taken as part of the CEERS early release science program , and the sensitive NIRCam parallel observations as part of the GLASS early release science program (Treu et al. 2022).
These three fields cover a total area of ∼51 arcmin 2 . The approximate 5 depths of these data sets reach from ∼28 to 29.2 mag and are presented in detail in Table 1. These depths are derived by measuring the flux variations in source-free 0.35 -diameter apertures across our reduced images of each field.
Our fiducial reductions of each data set are executed using the software (Brammer et al. 2022). has procedures in place both to minimize the 1/ noise and to mask "snowballs" on individual NIRCam frames.
combines NIRCam frames using the astrodrizzle software package, after modifying the headers of the frames to use the required SIP WCS headers. The reductions also take advantage of both significantly improved flat fields (and the jwst_0942.pmap calibration files) that became available in early Figure 1. Two color Lyman-break selection criteria we use in identifying our nominal ∼ 8, ∼ 10, ∼ 13, and ∼ 17 candidate galaxies in three of the JWST/NIRCam data sets that have been widely used for early galaxy selections. The thick black lines indicate the boundaries of our color-color criteria. The blue lines indicate the expected colors of star-forming galaxies with 100 Myr constant SF histories and ( − ) dust extinction of 0, 0.15, and 0.3, with colors at specific redshifts indicated by the black dots. The red lines show the expected colors of lower-redshift template galaxies from Coleman et al. (1980) out to ∼ 5. The solid blue circles show the colors of specific sources in our primary selection. In cases where sources are not detected in a band, they are shown with an arrow at the 1 limit. These nominal selections span the redshift ranges ∼ 7 − 9, ∼ 9 − 11, ∼ 12 − 14 and ∼ 16 − 19, respectively (cf. Figure 2).
September and the zeropoint adjustments derived by G. Brammer et al. (2022, in prep).
To better understand possible systematics and how they impact the selection of star-forming galaxies at ≥ 6, we also make use of the NIRCam imaging pipeline (Magee et al. 2022, in prep) built for the PRIMER team (PI: Dunlop). This pipeline leverages STScI's JWST Calibration pipeline (v1.6.2), but also includes additional processing steps which are not part of the standard calibration pipeline. This includes the subtraction of 1/ noise striping patterns (both vertical and horizontal) that are not fully removed by the standard calibration pipeline and the subtraction of "wisps" artifacts from the short wavelength filters F150W and F200W in the NRCA3, NRCB3, and NRCB4 detector images.
Additionally, the background sky subtraction is performed by subtracting the median background over a N×N grid while using a segmentation map to mask pixels attributed to sources. Image alignment is executed in two passes using the calibration pipeline's T R step and then using STScI python package T WCS: the first pass uses TweakReg to group overlapping images for each detector/filter and perform an internal alignment within the detector/filter group; the second performs alignment against an external catalog using T WCS. The external catalog is, if possible, generated from an HST ACSWFC image mosaic which has been registered to the GAIA DR3 catalog.
Finally, the P reductions we utilize take advantage of cali-bration files (jwst_1009.pmap) which have been updated to reflect new in-flight photometric zeropoints. 3 Before the final NIRCam image mosaics are generated using the calibration pipeline -_ 3 stage, we perform an additional step to identify and mask "snowball" artifacts that are not identified and masked during the _ 1 stage. For the HST Advanced Camera for Surveys (ACS) and Wide Field Camera 3 near-IR (WFC3/IR) observations over SMACS0723, GLASS parallel field, and CEERS Extended Groth Strip (EGS) field, we made use of a reduction of the data generated by for our fiducial set of reductions made with . For the alternate set of reductions made with , we used reductions made with for the SMACS023 and GLASS parallel fields, following many of the same procedures used in the product of the XDF data set (Illingworth et al. 2013). Finally, for the CEERS EGS field, we made use of the ACS and WFC3/IR data products made available by CEERS team prior to the start of science observations by JWST.

Source Detection and Photometry
As in previous efforts by our team (e.g. Bouwens et al. 2011Bouwens et al. , 2015Bouwens et al. , 2019Bouwens et al. , 2021Bouwens et al. , 2022c, we perform source detection and photometry using SExtractor (Bertin & Arnouts 1996). For our F090W, F115W, F150W, and F200W dropout selections, source detection is performed using the square root of 2 image constructed by coadding PSF-matched F200W, F277W, F356W, and F444W data over the fields. PSF matching is done using our own implementation of the Lucy-Richardson deconvolution algorithm (Richardson 1972;Lucy 1974).
Color measurements for sources are made based on the measured flux in 0.35 -diameter apertures, after PSF-correcting the shorter wavelength data to match the PSF in the F444W band. These measurements are then corrected to total using (1) the additional flux in scalable Kron (1980) apertures with Kron factor of 2.5 and (2) using the estimated flux outside these scalable apertures based on the encircled energy distribution in the derived PSFs.
Finally, a foreground dust correction based on extinction maps of Schlafly & Finkbeiner (2011) is applied to colors and total magnitude measurements.

Lyman Break Selections
We make use of two-color Lyman break selections to identify > 6 galaxies from the JWST data of the three fields identified above. Lyman-break selections have been shown to be a very efficient way of identifying star-forming galaxies in the distant universe (e.g., Steidel et al. 1999;Bouwens et al. 2011Bouwens et al. , 2015Bouwens et al. , 2021Schenker et al. 2013) and largely lie at the redshifts targeted by Lyman-break selections, given adequate S/N and bands either side of the break (e.g. Steidel et al. 1999Steidel et al. , 2003Stark et al. 2010;Ono et al. 2012;Finkelstein et al. 2013;Oesch et al. 2015;Zitrin et al. 2015;Oesch et al. 2016;Hashimoto et al. 2018;Jiang et al. 2021).
In devising color-color criteria for our selection, we follow the strategy employed in Bouwens et al. (2015Bouwens et al. ( , 2021 and make use of a two-color selection criterion, the first color probing the Lyman break and the second color probing the color of the -continuum just redward of the break. In choosing the passbands to utilize for this second color, we select bands which show no overlap with either the Lyman or Balmer breaks, ensuring that our selection would include even sources with prominent Balmer breaks, as  find at ∼ 7-10.
After some experimentation, we made use of the following two color criteria: for our nominal ∼ 8 selection,  Redshift selection functions for our nominal ∼ 8, ∼ 10, ∼ 13, and ∼ 17 selections leveraging a Lyman-break selection identifying sources dropping out in the F090W, F115W, F150W, F200W, and F277W bands, respectively. The mean redshift of these selections is 7.7, 10.0, 12.9, and 17.6, respectively. These nominal redshift selections can be characterized as ranges with ∼ 7 − 9, ∼ 9 − 11, ∼ 12 − 14 and ∼ 16 − 19 (the approximate half-power points of the redshift selection functions) for our nominal ∼ 17 selection. In cases where sources are undetected in a given band, flux is set to the 1 limit when applying the selection criteria. As we show below, when we discuss the redshift selection functions, these nominal redshift selections are better characterized as selections with ∼ 7 − 9, ∼ 9 − 11, ∼ 12 − 14 and ∼ 16 − 19 (the approximate half-power points of the redshift selection functions).
Given the significant variation in the composition of various ≥ 8 selections in the literature, we have purposefully required that sources show especially large Lyman breaks to maximize the robustness of the sources we select. The presence of a large spectral break is perhaps the most model-independent feature of star-forming galaxy at very high redshifts and will significantly less sensitive to uncertainties in the NIRCam zeropoints than photometric redshift codes that rely on fits to the spectral energy distribution. As in Bouwens et al. (2015), sources are excluded from a Lyman break selection, if they meet the selection criteria of a higher-redshift sample.
Additionally, we require that sources show no significant flux blueward of the break. For this, we co-add the flux blueward of candidate Lyman breaks using the 2 statistic defined in Bouwens et al. (2011Bouwens et al. ( , 2015 and which is equal to Σ SGN( )( / ) 2 where is the flux in band in a consistent aperture, is the uncertainty in this flux, and SGN( ) represents the nominal "sign" function from mathematics, being equal to 1 if > 0 and −1 if < 0. Included in this statistic for the following redshift selections are the following bands: ∼ 8 : To ensure that sources in our selection corresponded to real objects, we require that sources be detected at 6 in a stack of all bands redward of the break in a 0.35 -diameter aperture. We also require sources to be detected at 5 in the band just redward of the break to ensure that the break is present at high significance. Sources are also required to be detected at 3.5 in at least 5, 4, 4, and 3, independent bands redward of the Lyman-break in our ∼ 8, ∼ 10, ∼ 13, and ∼ 17 selections. Following our selection of candidate ∼ 8-17 sources using the above criteria, redshift likelihood functions ( ) were computed for each source using the EAZY photometric redshift code (Brammer et al. 2008). In fitting the photometry of individual sources, use of spectral templates from the EAZY_v1.0 set and Galaxy Evolutionary Synthesis Models (GALEV: Kotulla et al. 2009) was made. Nebular continuum and emission lines were added to the templates according to the prescription provided in Anders & Fritze-v. Alvensleben (2003), a 0.2 metallicity, and scaled to a rest-frame EW for H of 1300Å. Sources are only retained in our > 6 selections, if >80% of the integrated redshift likelihood is at > 5.5, i.e., ( > 5.5) > 0.8.
Finally, all candidate ≥ 8 galaxies are visually examined to exclude any sources associated with diffraction spikes, on the wings of early type galaxies, or in regions of the images with elevated background levels.
The approximate redshift distributions of these selections are illustrated in Figure 2 and derived using our selection volume simulations described in §4.1. Using these selection volume simulations, the mean redshifts inferred for our F090W, F115W, F150W, and F200W dropout selections are equal to 7.7, 10.0, 12.9, and 17.6, hence our nominal use of ∼ 8, ∼ 10, ∼ 13, and ∼ 17 to identify these samples throughout our paper.

≥ 8 Selections
Applying our selection criteria to our fiducial reductions using , we identify 18 ∼ 8, 12 ∼ 10, 5 ∼ 13, and 1 ∼ 17 galaxies which satisfy all of our selection criteria. The apparent magnitude of these sources range from 25.8 to 28.4 mag. A list of these sources is presented in Table 2 and will be known as our primary selection. Figure 1 shows the colors of our selected ∼ 8-17 sources relative to the two color criteria we utilize.
We also indicate in Table 2 which sources from our selections lie in earlier selections. Encouragingly, ∼67% of the ≥ 8 candidates from our selection lie in previous ≥ 8 selections. This is a significantly higher level of overlap than was the case for the first set of ≥ 8 selections from the SMACS0723, CEERS, and Abell 2744 parallel data sets, thanks to the slightly more complete selections of ≥ 8 sources in  and Donnan et al. (2023, v2) taking advantage of a much improved NIRCam zeropoint calibration.
We also pursue an alternate selection of ≥ 8 galaxies based on our NIRCam reductions using the pipeline. We identify 22 ∼ 8, 13 ∼ 10, 3 ∼ 13, and 1 ∼ 17 galaxy candidates in that selection. Table A1 in Appendix A provides the coordinates, estimated redshifts, magnitudes, constraints on the Lyman break amplitudes, and estimated likelihood to lie at > 5.5.
Comparing our two selections, we find ∼47% of the sources in our primary selection are also present in our alternate selection, while ∼38% of the sources in our alternate selection also occur in our primary selection. The percentage overlap between our selections is similar to the ∼50% overlap frequently seen between different ∼ 4-8 selections executed over the Hubble Ultra Deep Field with HST data (see §3.4 of Bouwens et al. 2015 for a discussion). The existence of differences between the two selections is not surprising given our use of two different reductions of the data to identify sources and measure fluxes.
In an effort to better understand why there is only modest overlap between the two selections, we compared the photometry of sources in catalogs where they are selected vs. where they are not selected. We found that most of the observed differences could be explained by variations in the size of the spectral breaks for selected sources and the apparent flux blueward of the nominal spectral breaks.

≥ 8 Selections from the Literature
Given the many challenges that exist in making use of the first JWST observations in identifying ≥ 8 galaxies, the uncertain NIRCam zeropoints being perhaps the largest (e.g. Adams et al. 2023), it is useful for us to provide an alternate assessment of the many ≥ 8 candidates which have been identified in the early JWST observations. This can help provide us with insight into both the reliability and completeness of earlier JWST/NIRCam selections.
There have been selections of ≥ 8 galaxies conducted by at least ten different teams, including Naidu et al.  . In general, these studies have focused on identifying sources from one or more of the same three data sets considered here, allowing for an extensive set of comparisons between the different selections and independent assessments of various ≥ 8 candidates.

Evaluation and Segregation into Different Subsamples
In this subsection, we focus on the ≥ 8 candidates identified over the three most studied JWST NIRCam fields (SMACS0723, CEERS, and Abell 2744 parallel), and provide an independent evaluation of their robustness. To provide this evaluation, we have performed 0.35 -diameter aperture photometry on all the identified candidates from Naidu et al.  . For papers where changes have occurred to the identified high-redshift sources, we consider the catalogs in each version of various papers. Not only is this useful for gaining perspective on progress that has been made, but we have found that some sources in earlier versions of papers also appear to be credible > 8 candidates, and therefore we have included these sources in our analysis to be as comprehensive as possible. We perform photometry on both our fiducial and alternate reductions using and , respectively.
We then look for the presence of a significant spectral break in sources and compute redshift likelihood distributions for candidates based on our derived photometry of the recent and reductions. We then segregate sources into three different samples: • (1) one where the cumulative probability of candidates lying at > 5.5, i.e., ( > 5.5), exceeds 99% using the photometry we have performed on both NIRCam reductions utilized here, • (2) one where ( > 5.5) is in excess of 80% and 50% for our fiducial and secondary reductions, in excess of 70% for both our fiducial and secondary reductions, or in excess of 50% and 80% for our fiducial and secondary reductions, but does not the former selection criteria, and • (3) one which does not satisfy either of the former selection criteria.   Table 3. Sample of ∼ 10, ∼ 13, and ∼ 17 candidate that we deem to be "robust" using our own photometry on two separate reductions of the NIRCam Data.  Spectroscopically confirmed to have a redshift = 8.498 (Carnall et al. 2023) † While these candidates were identified in earlier versions of these manuscripts, they did not make it into the final versions of these manuscripts. Nonetheless, our analysis suggests they are "robust" > 5.5 sources.

ID RA DEC
* For simplicity, no account is made for lensing magnification for sources over the SMACS0723 and Abell 2744 parallel fields.
We refer to (1) the first selection of sources from the literature as the "robust" sample, (2) the second selection as the "solid" sample, and (3) the third selection of sources as the "possible" sample.

Results
We present these different samples in Table 3 and Tables B1-B5 of Appendix B. Interestingly enough, only 18 ∼ 10 and 3 ∼ 13 candidates from these fields satisfy our criteria for being robust. 75 of . Apparent magnitude and estimated redshifts of ≥ 7 candidate galaxies identified over the JWST/NIRCam SMACS0723 (solid green circles), GLASS parallel (solid blue circles), and CEERS fields (solid red circles) from our searches of the data sets. The open symbols indicate other ≥ 8 galaxy candidates that have been identified in the literature and qualify as "robust" or "solid" ≥ 8 candidates according to our own SED fits and photometry. For context, we show the magnitudes and redshifts of sources that have derived from a comprehensive set of blank and lensing fields observed with HST (Bouwens et al. 2015(Bouwens et al. , 2022c. The blue solid line shows the apparent magnitude of sources with an absolute magnitude of −21, which is an approximate characteristic luminosity of galaxies at ≥ 3 (Steidel et al. 1999;Reddy & Steidel 2009;Bouwens et al. 2015Bouwens et al. , 2021Finkelstein et al. 2015;Bowler et al. 2015). the reported candidates in the literature qualify as "solid" candidates, while 108 of these candidates qualify as "possible." Of the candidates we classify as "robust," only 10 have an estimated redshift ≥ 9 using our fiducial photometry. The most consistent characteristics of sources in our robust lists is that they show either very pronounced (≥1.5-mag) spectral breaks in the observed photometry or show two spectral breaks (Lyman + Balmer, as was a key aspect of the Labbe et al. 2022 selection). Additionally, to the extent that the present compilation of "robust" ≥ 9 candidates overlap with the redshift ranges and fields examined by , three of the four sources from our compilation, i.e., GL−z11, GL−z13, and 32395_2 Finkelstein et al. 2022a), receive robust designations in , with Δ ( ) 2 − ( ℎ ℎ ) 2 equal to 71.9, 72.3, and 14.5, respectively, each of which is well above their Δ 2 > 9 selection criterion for secure sources. This is reassuring and gives us confidence that at least for this subset of ≥ 9 candidates, the inferred redshifts might be reasonably secure.
From these numbers, it is clear that the majority of ≥ 9 candidates identified to date only qualify as "possible" ≥ 9 candidates and do not meet the higher quality standards required to be classified as "robust" or "solid." Interestingly enough, essentially all ≥ 8 studies presenting significant samples of ≥ 8 candidates from the first JWST fields, e.g., , , , Yan et al. (2023), all contain sources that lie in the "possible" category given our photometry. Interestingly enough, of the candidates we grade as "robust," "solid," and "possible" and where we compute photometric redshifts ≥ 9, 90%, 28%, and 12%, respectively, are also independently reported as a ≥ 8 candidate galaxy in a separate manuscript from the literature. Figure 3 shows the distribution of the sources we find from our reductions over the three NIRCam fields (SMACS0723, GLASS parallel, and CEERS) in redshift and luminosity vs. the comprehensive earlier selection of ∼ 2-11 sources from Hubble constructed by Bouwens et al. (2015Bouwens et al. ( , 2021Bouwens et al. ( , 2022c. We also show the "robust" or "solid" ≥ 8 galaxy candidates from the earlier studies with JWST. In Figure 4, we show the number of sources that are contained in our literature subsamples of "robust," "solid," and "possible" candidates as a function of redshift. The number of "solid" and "possible" candidates at > 11.5 are ∼8× and ∼24× larger, respectively, than those which we grade as "robust." Clearly, it is essential that higher quality JWST data become available for sources in these samples to determine the fraction that are actually at > 11.5.  Leethochawalit et al. (2023) or Endsley et al. (2022) where source selection only extends to redshifts of ≈ 9. Number of ≥ 8.5 candidates identified in the magnitude range well probed in most studies in the literature (i.e., <29 mag) Fraction of ≥ 8.5 candidates from this study that satisfy our criteria for being "robust" ≥ 8.5 candidates. Fraction of ≥ 8.5 candidates from this study that satisfy our criteria for being "robust" or "solid" ≥ 8.5 candidates. Fraction of the total set of "robust" and "solid" ≥ 8.5 candidates (Tables 3, B1, and B2) identified in a given study. Only the search fields utilized in a study are considered for these completeness estimates.
Fraction of the total number of ≥ 8.5 candidates identified in a given study. Only the search fields utilized in a study are considered for these completeness estimates.
The purity of selections focusing on the SMACS0723 cluster and parallel field are likely lower than the other selections due to the lack of especially sensitive F115W data over the fields.
We would not expect the  selection to be an especially complete representation of star-forming galaxies at > 6, given their choice to select only those galaxies with prominent Balmer breaks. ℎ Since Naidu et al. (2022b) expressly only search for > 10 sources which are particularly bright and which show high S/N (>10) detections in both F356W and F444W sources, we somewhat arbitrarily evaluate the completeness of their selection to 27 mag.

Characterization of Literature Subsamples
To help interpret the quality of the ≥ 8 candidates that we segregated into different categories, we derive median fluxes for candidates in each category. Prior to the median stacking, a renormalization of the fluxes in individual sources is performed such that the F200W, F277W, and F356W band fluxes for the ∼ 10, ∼ 13, and ∼ 17 samples, respectively, are 36 nJy. The results are presented in Table C1 of Appendix C.
The most significant difference between the different stacks is the flux blueward of the break. For both the "robust" and "solid" stacks, no significant flux is present blueward of the putative Lyman breaks and large Lyman breaks are seen, i.e., ≥1.8 mag. However, for the "possible" stack, not only is the flux in the median stacks nominally significant 1-2 in individual bands, but the putative breaks are smaller, i.e., ∼1.0-1.5 mag. Because of such characteristics, the reliability of sources in the "possible" samples is lower, as indicated also by the much greater likelihood these sources show for being at lower redshifts from the individual SED fit results.
To provide some measure of the quality and completeness of ear-lier selections of 8.5 galaxies derived from the first JWST fields, we have calculated the total number of "robust"+"solid" ≥ 8.5 candidates that have been identified to ∼29 mag over various JWST fields by our selection or those in the literature and quantified the fraction of these candidates that have been identified in various studies. We present this fraction as the completeness of each selection in Table 4. In estimating the completeness, we only consider the fields included in a given selection. As one example, since  only search for ∼ 8-11 F090W-dropout galaxies over the GLASS parallel field as part of their ∼ 9 selection, we do not treat compelling ∼ 8-11 galaxies found over other fields like SMACS0723 or CEERS as contributing to our assessment of completeness in their study.

Evaluation of Earlier ≥ 8 Selections
In Table 4, we also present an approximate "purity" for each selection by dividing the number of "robust" candidates in each selection by the total number of reported candidates in a study as well as the and "possible" sources vs. redshift (red, pink, and gray shaded histograms, respectively). The redshift of any given source is taken to be equal to the redshift of the study where it is presented in the literature for our "solid" and "possible" samples and the geometric mean redshift for sources in our "robust" sample. The numbers of sources per bin are shown in a cumulative or stacked sense, such that top of the gray histograms indicate the total number of sources in the "robust," "solid," and "possible" literature samples. There are ∼8× more "solid" and ∼24× more "possible" sources reported in the literature at > 11.5 than there are "robust" sources, illustrating the potentially large uncertainties in the overall number of bona-fide star-forming galaxies at very high redshifts. Givven the uncertainties, it is clearly imperative to definitively quantify the redshifts of many of these candidates to determine how rapidly galaxies assemble in the early universe.
number of "robust" and "solid" candidates in a given study. In grading individual candidates from various studies, we only include sources which magnitudes brightward of 29 mag to limit our analysis to those sources with the highest S/N and to increase the probability that sources will be selected as part of multiple studies.
We caution that the results we obtain here are completely reliant on the photometry we derive for the candidates from our two reductions and the SED template sets we utilize in our analysis. As such, these results (and the remarks in the paragraphs which follow) should be taken as merely indicative, and clearly the ultimate arbiter of the purity and completeness of individual selections will be deep spectroscopy with JWST (e.g. Roberts-Borsani et al. 2022a;Curtis-Lake et al. 2022;Tang et al. 2023;Bunker et al. 2023). For the purposes of this calculation, we treat sources with a "possible" designation as corresponding to lower-redshift interlopers, but clearly there is some uncertainty in this designation and many candidates we grade in this category might well prove to be at ≥ 8.
There are a few noteworthy results to notice in the results presented in this table. First of all, there has been a clear improvement in both the purity and completeness of most ≥ 8 samples since NIRCam data from JWST became public, as one might expect to improvements in the NIRCam zeropoint calibrations. As one example, the purity of the  selections -in terms of sources graded either "robust" or "solid" -have improved from 59% (v1) to 90% (v2). Other newer analyses which are able to take advantage of the improved zeropoint calibrations are the fiducial and secondary selections from the present analysis as well as those from Finkelstein et al. (2022b); these selections feature a purity of 88%, 76%, and 87%, respectively. Achieving a high purity appears to have been more difficult for analyses that focus on the SMACS0723 data set (e.g., Yan et al. 2023 but see however Adams et al. 2023), likely due to the significantly shallower F115W observations available in the first JWST data over that field.
Second, selections that focus on the most luminous galaxies at ≥ 8, i.e., Naidu et al. (2022b,a); Adams et al. (2023), or selections which focus on sources with multiple spectral breaks (e.g. ) show a much higher reliability than those that focus on a broader selection of sources. Based on the present analyses, we find 100% purity for all three of these selections in our analysis. This contrasts with more ambitious selections aiming to select the bulk of the star-forming galaxies at ≥ 8, e.g., , , , and the present selections where ∼25% of the sources in these selections are graded as "robust," 50-60% of the sources are graded as "solid," and the final ∼15% of the sources in such selections are graded as "possible." A third striking result is the large differences in the completeness of selections. The majority of the analyses only include a fraction ( 35%) of the candidates we grade as "solid" or "robust" in our analysis. In many analyses, this appears to have been the result of a clear choice to include only those sources which appear to be the most reliable, either because higher amplitude Lyman breaks are required (this work) or because the SED fits to > 8 solutions are required to give a much lower values of 2 (Δ 2 > 9) than lower redshift fits ). Nevertheless, the  selection appears to perform the best as far as completeness is concerned, showing a ∼2× higher completeness in their identification of "solid"+"robust" ≥ 8 sources than most of the other analyses and also successfully selecting the ≥ 8 sources found by  with prominent Balmer breaks ( Table 3). The latter sources mostly miss our own selections due to their Lyman breaks having a smaller amplitude than 1.5 mag required to be included in our own samples.
One consequence of the relatively low estimated completeness for most selections is only a modest (∼20-35%) overlap between ≥ 8 selections. Table 5 quantifies the number of sources that are in common for differing selections over the same fields out of some total possible. Nevertheless, it is worthwhile noting that there has been an improvement in the overlap between samples. Initially, most of the overlap between studies was confined to a few bright ∼ 10-12 sources such as have been found by Naidu et al. (2022b) and  and perhaps 10-15% of the rest, but now the overlap is approximately ∼30% between selections, approaching the ∼50% overlap seen in ∼ 4-8 selections obtained by HST over the Hubble Ultra Deep Field (Beckwith et al. 2006), e.g., see §3.4 of Bouwens et al. (2015) where overlap with other ∼ 7-8 selections (e.g. McLure et al. 2013;Schenker et al. 2013) is discussed.

LUMINOSITY FUNCTION RESULTS
In this section, we make use of the rather small samples of highlikelihood ≥ 8 galaxy candidates over the three most well studied JWST fields to derive LF results. We begin with direct determinations of the LF results using our own selections and then move onto determinations based on collective samples of ≥ 8 galaxies identified in the present and previous studies. We conclude this section with a comparison of these results with several previous determinations.  There is no overlap between specific fields and redshift ranges utilized in the two selections being compared.
Remarkably, approximately half of the overlap between these studies are the two bright sources from Naidu et al. (2022b) and . If we exclude those two sources from consideration, overlap between the selections is only ∼10%.
Fraction in parentheses indicates the overlap in the initial versions of the catalogs from these papers. In the majority of cases, the fraction in the updated versions is higher.

Results Using Our Own Samples
We begin by describing LF results derived using our own ≥ 8 samples constructed from our fiducial reductions of the available JWST data.
Given the small number of sources in each of our samples, we derive LF results using the 1/ max technique and assuming Poissonian statistics. As in our own earlier analyses, we derive LF results by maximizing the likelihood L of producing the observed distribution of apparent magnitudes given some model LF: where we take the likelihood of LF results derived over the set of fields we select sources and over a set of apparent magnitude intervals .
Given the lack of F090W observations over the CEERS fields and the limited depth of F115W observations over SMACS0723, we only consider sources in the GLASS parallel field for our ∼ 8 LF determination and sources over the GLASS parallel and CEERS fields for our ∼ 10 LF determinations. For our ∼ 13 and ∼ 17 determinations, we consider sources over the GLASS parallel, SMACS0723, and CEERS fields. For simplicity and because none of our ≥ 12 candidates lie within <60" to the high magnification areas of the Abell 2744 and SMACS0723 clusters, we ignore the impact of lensing magnification on our LF results.
Since we are assuming Poissonian statistics, the probability of finding observed, sources where observed, is the number of observed sources in magnitude interval while expected, is the expected number given some model LF. We compute the number of expected sources expected, based on some model LF using the equation where , is the effective volume over which a source in the magnitude interval might be both selected and have a measured magnitude in the interval . We compute the selection volume for our samples by inserting   are shown with the gold, green, magenta, and blue points, respectively. The black lines give the LF results one would obtain from the early NIRCam fields if there were just a single source per 1-mag bin, indicating approximately how low in volume density the early fields probe (the equivalent 1 upper limits are indicated by the upper edge of the gray shaded region obtained by multiplying these lines by 1.841: Gehrels 1986). Gray lines are drawn connecting the gray points (to help delineate the LF derived from the full literature sample of sources). The reason the gray line has a much higher normalization than any LF in the literature is due to there being a much larger number of ≥ 9 candidates reported thus far over the SMACS0723, GLASS parallel, and CEERS fields than are present in any one individual analysis. A better match with the early LF results from JWST can be obtained by multiplying the shaded gray region by 0.4 (shown with the green line). These results suggest that either early JWST LF results in individual analyses are too low (due to incompleteness) or that early selections suffer from substantial (≥50%) contamination from lower-redshift galaxies. artificial sources with various redshift and apparent magnitudes at random positions within the NIRCam images for each of these fields and then attempting both to detect the sources and select them using our ∼ 8, ∼ 10, ∼ 13, and ∼ 17 selection criteria. We assume the -continuum slopes of sources to have a mean value of −2.3, with a 1 scatter of 0.4. These -continuum slopes are in reasonable agreement with determinations available on the basis of both HST+Spitzer data (e.g., Dunlop et al. 2013;Wilkins et al. 2016;Stefanon et al. 2022) and now JWST data Cullen et al. 2022). Additionally, we adopt point-source sizes for the artificial sources we inject into various images in our simulation and recovery experiments. While the present size assumptions are not especially different from that found for galaxies at ∼ 8-17, both using earlier HST observations and now using JWST observations (Naidu et al. 2022b,a;Ono et al. 2022), they may lead to a slight overestimate of the total selection volume. While it is worthwhile keeping this in mind for the discussion which follow, these uncertainties are likely small in comparison to the very large uncertainties in the total number of bona-fide ≥ 8 galaxies over these fields (amongst the many sources from the literature we have graded as "possible").
We use 0.5-mag bins in deriving our stepwise LF results, while for our parametric determinations, we adopt both a Schechter and double power-law functional form: where * is normalization, is the faint-end slope, is the brightend slope, and * indicates some characteristic luminosity where there is a transition between the two regimes.
For the Schechter function results, we fix the * to −21.15 mag consistent with the The partially transparent pink and red points indicate the LF results derived alone from the "solid" and "robust" candidates, respectively. The shaded red region gives LF results one would obtain including all the literature candidates in our "robust" list, but no more than in our "solid" list of candidates. Given that even this low edge of the shaded region exceeds constant star formation rate efficiency model predictions at ∼ 10 suggests that star formation in the early universe may be much more efficient than suggested in many analyses with HST and ground-based data (Bouwens et al. 2015Harikane et al. 2018Harikane et al. , 2022Oesch et al. 2018;Tacchella et al. 2018;Stefanon et al. 2021Stefanon et al. , 2022. Differences with the constant star formation efficiency models are even more substantial at ≥ 13.  Bowler et al. (2020) provide for evolution of the LF using a double power-law parameterization. We present our binned LF results at ∼ 10, ∼ 13, and ∼ 17 LF results in both Table 6 and Figure 5. The parameterized fit results are presented in Table 7 and on Figure 5 as red lines. We also derived LF results at ∼ 8 as a test of our procedures for deriving LFs at ≥ 10. The results are shown in both Figure D1 from Appendix D and Tables 6-7. Encouragingly the results we obtain are consistent with the earlier determinations we obtained from HST data in Bouwens et al. (2021).
The present LF results appear to be fairly similar to the LF results of  and  at ∼ 9-11. At ∼ 13, we find a ∼1.5-2× higher volume density of sources than , , and Finkelstein et al. (2022a), and at ∼ 17, the volume density we find for sources is ∼3× higher than what  recover. At the bright end of the ∼ 10-13 LFs, our results are very similar to Naidu et al. (2022b). In general, there is broad similarity in all LF results obtained to the present with JWST, given the limited statistics available and thus large uncertainties.

LF Results from Our Literature Samples
As an alternative to direct determinations of the LF from our own selections of ≥ 8 candidates, we also consider the use of the literature results we analyzed and characterized in ( §3.3) to derive LF results at ∼ 10, ∼ 13, and ∼ 17.
As we have already noted, large numbers of ≥ 9 candidate galaxies have been identified in various analyses of the early NIRCam data, and the purpose of this analysis is to show the implications of these results for the ≥ 9 LFs assuming that a significant fraction of these candidates are at ≥ 9.
It is interesting to derive the implied LF results as a function of the apparent robustness level of the candidates, to demonstrate how high the volume density of sources is even including only the best candidates. We take the luminosity of individual candidates to the values we measure based on our own photometry. A complete list of the candidates we utilize and their classification into the groups defined in ( §3.3) is provided in Tables 3 and B1-B5.
Given the diverse selection criteria used to construct these literature LFs, we take the selection volume to be equal to the detection volume, as we very conservatively assume that all detected sources are selectable by one or more of the diverse selection criteria used in the literature. By making this assumption, our derived LFs should be as low as possible given the available selection volume at high redshift.
To illustrate the implications of including all of the published ≥ 8 candidates to the present in LF determinations, we present ∼ 10, ∼ 13, and ∼ 17 LF results in Figure 5 using the solid grey circles and error bars. For clarity, LF and luminosity density results derived for this "possible" sample and later for the "solid" literature sample also include the full set of sources from the "solid"+"robust" and "robust" samples, respectively. Gray lines are drawn connecting the grey points to help the literature LF results. Those LF results are some ∼ 2-10× higher than the LF results reported by , , Bouwens et al. (2022a), and  over the luminosity range −20 to −18 mag. One explanation is that this full sample includes sources we have graded as "possible", which we estimate to have a lower probability of being at > 8.
Thus, one reason these LFs might be so much in excess of the individual LF determinations is because the list of "possible" > 8 candidates (Table B3-B5) include large numbers of lower-redshift interlopers. This motivates us to also derive LF results based on candidates which satisfy much more stringent quality requirements, such as those that make up our "solid" or "robust" sample of sources from the literature.
Results for the "solid" and "robust" samples are shown in Figures 6 with the partially transparent pink and red points, respectively. Even the ∼ 10 LF results from the "solid" candidates exceed the results from  and  by factors of ∼1.5 to 2, but agree better with the our own results and those of . At ∼ 13 and ∼ 17, the LF results derived from the "solid" candidates lies even more clearly in excess of the LF results from , , and our own analysis.
Given current uncertainties over what fraction of current ≥ 9 candidate lists are bona-fide, we express the LF results we derive from the literature in Table 8 in terms of a region spanning the range between our LF results using the "robust" candidates and the candidates we classify as "solid." These results are also shown in Figure 6, and we can see it easily encompasses the range of LF results reported in various studies.

Evolution of Star-Forming Galaxies from ∼ 17 to ∼ 8
There has been a lot of discussion over the last ten years regarding how much star formation took place during the earliest epochs of the universe, when > 10. Some of this discussion had been based on the evolution of the LF at > 6 and debate between a slower evolution in the apparent SFR and luminosity density (e.g., McLeod et al. 2016) and a more rapid evolution (e.g., Oesch et al. 2014Oesch et al. , 2018Bouwens et al. 2021).
The relatively small number of apparently robust ∼ 10 candidates identified in the wider area data searched by Oesch et al. (2018) seemed to weigh in favor of a faster evolution. Nevertheless, the apparent discovery of many luminous galaxy candidates (particularly now with JWST) in the ≥ 9 universe over wide areas (e.g., Bowler et al. 2020;Roberts-Borsani et al. 2022b;Harikane et al. 2022;Finkelstein et al. 2022c;Bagley et al. 2022;Kauffmann et al. 2022;) and the discovery of apparent Balmer breaks in galaxies at ≥ 9 (Zheng et al. 2012;Hashimoto et al. 2018; pointed in the other direction, towards more substantial early star formation activity. A good baseline for evaluating early star formation activity is through comparison with the predictions of constant star formation efficiency models (SFE). Already, such models have succeeded in providing a plausible baseline for modeling star formation across cosmic time (e.g. Mason et al. 2015;Bouwens et al. 2015Bouwens et al. , 2021Mashian et al. 2016;Harikane et al. 2018Harikane et al. , 2022Oesch et al. 2018;Tacchella et al. 2018;Stefanon et al. 2021Stefanon et al. , 2022. While there have been a large number of models using the constant SFE assumption to model the evolution of the SFR density across cosmic time, we will test the results against only four: Mason et al. (2015), Bouwens et al. (2015), Tacchella et al. (2018), and Harikane et al. (2022).
A comparison of the constant star formation efficiency model results are shown in Figure 6. As in other recent studies, the evolution on the LF appears to be significantly in excess of that predicted  Robertson et al. (2022); Curtis-Lake et al. (2022). The light gray, pink, and red shaded regions give the luminosity densities at ∼ 10-17 inferred based on literature candidates graded here as "robust," "solid," and "possible," respectively. The reason the latter two regions are 3-7× and 8-20× higher, respectively, in luminosity density than inferred by some analyses in the literature, e.g.,  and , is due to many analyses only including a fraction of the potentially credible ≥ 8 candidates from the literature (which are here graded as "solid" or "possible"). The magenta line shows the fiducial star formation history derived by Madau & Dickinson (2014) extrapolated to > 8 and shifted downward by 0.5 dex to approximately account for the plotted densities being integrated down to a 2-mag shallower limit than in the fiducial Madau & Dickinson (2014) probe. The orange lines indicate the expected evolution in the luminosity density assuming no evolution in the star formation efficiency of galaxies across cosmic time using the models of Mason et al. (2015: solid), Tacchella et al. (2018: dot-dashed), Bouwens et al. (2021: dotted), and Harikane et al. (2022: dashed). from constant SFE models at ≥ 12 (for galaxies more luminous than ∼ −19). Not only does this clearly appear to be the case for all LF determinations at ≥ 11, but it is even true if we only make use of sources from the literature that we classify as robust. If one or more of the candidate ∼ 17 galaxies is actually at such a high redshift Naidu et al. 2022a: but see also Zavala et al. 2023;Naidu et al. 2022a), differences with the constant SFE models is even larger.
It is unclear whether this indicates the SFE of galaxies is indeed more efficient or if the IMF of (luminous) galaxies is very different at early times. If the stellar masses in > 8 galaxies are as high as found in , it would argue in favor of a substantially higher SFE. There is clearly a limit to how high the SFE can be based on the baryon mass in collapsed halos at > 8, and interestingly enough, both Boylan-Kolchin (2022) and Naidu et al. (2022a) find that some galaxies may be in violation of these limits. Potential resolution of this enigma could include an evolution in the stellar IMF in star-forming galaxies at > 8 such that the mass-to-light ratio in early galaxies is substantially lower than at later times in the history of the universe (e.g. Steinhardt et al. 2022;Inayoshi et al. 2022).

5.2
Luminosity and SFR Densities of Galaxies at ≥ 8 An alternate way of assessing the star formation activity in the early universe is by looking at results in terms of the luminosity density and SFR density. In characterizing the evolution, we only consider sources and LF results brighter than −19 mag to avoid extrapolating the LF faintward of what can be well probed with early JWST data, i.e., ∼29 mag, as used both in the present study, , and .
We have adopted such a limit to avoid substantial extrapolations of LF results to much fainter luminosities where they are less well constrained. If we consider extrapolations to −17 mag (as considered in both , ∼ 17 SFR density results derived assuming a faint-end slope of −2.1 (as assumed by  vs. assuming a faint-end slope of −3 (as predicted at ∼ 17 by Mason et al. 2015) differ by ∼1 dex. Given that difference would then be driven entirely by the assumed faint-end slope, it is clearly preferable to quote SFR density results only to luminosity limits which are well probed by the observations.
In Figure 7, we present our results for the luminosity density evolution both from our direct LF analyses. Additionally, we include the equivalent SFR density results, assuming the conversion factor K is 0.7 × 10 −29 year −1 erg −1 s Hz from Madau & Dickinson (2014), which assumes a Chabrier (2003) IMF, a constant star formation rate, and metallicity = 0.002 . For context, we also include the results obtained by several other analyses of the JWST observations Bouwens et al. 2022a;) and also several constant star formation efficiency (SFE) predictions for the luminosity density evolution (Mason et al. 2015;Tacchella et al. 2018;Bouwens et al. 2021;Harikane et al. 2022). For context, a magenta line is included showing the fiducial star formation history derived by Madau & Dickinson (2014) extrapolated to > 8 but adjusted to be relevant for SFR probes down to −19 mag. We implement this adjustment as a 0.5 dex offset reflecting the difference in luminosity densities derived by Bouwens et al. (2021) to −19 mag (the limit used here) vs. −17 mag (the limit used by Madau & Dickinson 2014). It is interesting to see how expectations at > 8-9 have evolved from a decade earlier and how uncertain the SFRD still remains in the first ∼500 Myr.
In Figure 7, we also show the luminosity density results derived from our literature samples of the same fields. Separate results are presented for candidates categorized as "robust," "solid," and "possible" with the shaded red, pink, and grey regions, respectively. For additional reference, we include as a solid black line and upward arrows the implied lower limits on the luminosity densities at > 10 based on the recent JWST ADvanced Extragalactic Survey (JADES) spectroscopic results over the HUDF/XDF region (Robertson et al. 2022;Curtis-Lake et al. 2022). For those limits, we adopt the luminosities measured by Robertson et al. (2022) and assume a total search area of 2 × (1.5 arcmin) 2 and that sources can selected over the entire volume = 10-12 and = 12-14.
It is striking how much higher the implied luminosity densities of the "possible" candidates are relative to the results derived from those candidates in the other categories. Results including all of the candidates are ∼3× and ∼8× higher than those candidates we grade as "solid" and "robust," respectively, at ∼ 10 and ∼7× and ∼20×, respectively, higher at ≥ 12. These same luminosity density results are also significantly in excess of our own luminosity density results as well as the results of , .
Clearly, much of the excess could be due to the presence of potentially substantial numbers of lower-redshift contaminants in various ≥ 8 selections. The detection of possibly significant flux blueward of the breaks in the median stacks of the "possible" candidates is indeed suggestive of such a conclusion (cf., §3.3, Appendix C). There are clearly large uncertainties in what fraction of these fainter sources are at high redshifts. As we demonstrate in Appendix E, the assessment of the reliability of specific ≥ 8 candidates can vary substantially between the different studies. It is indicative of the challenges with these early data sets that our independent evaluation of the candidates from , , , and our own selections place a non-negligible fraction of these candidates ( 20%) in our lowest quality bin (Table 4).
Meanwhile, results using the "robust" candidates appear to be in excellent agreement with the collective LF results of  and , while our own results and those of  agree better with the results obtained using the "solid" candidates. The  LF results appear to be ≈2× higher than the  results due to the ∼2× higher completeness of the  selection to "robust"+"solid" ≥ 8 candidates from the literature (Table 4). 4 Without spectroscopy, it is difficult to know which of these two results is more reliable. A key question is the extent to which sources in our "solid" literature sample are at > 8. Simulation results from both  and Larson et al. (2022) indicate that ≥ 8 selections over CEERS-like data sets might well include an appreciable number of lower-redshift interlopers, even restricting such selections to sources with >80% of the integrated likelihood at > 5.5 (as is required for sources that make up our "solid" literature selections). Based on the expected contamination in the first JWST fields (likely due to the limited depth of the data blueward of the break),  require that ≥ 9 candidates satisfy an especially demanding Δ( 2 ( ) − 2 ( ℎ ℎ )) > 9 selection criterion to be included in their high-redshift samples.
Another concerning aspect of sources in our "solid" literature selections is the much less significant overlap between candidates reported in different studies. While 90% of the ≥ 9 candidates in our "robust" literature selections are identified as part of multiple studies, only 26% of the ≥ 9 candidates in the "solid" literature selections are found in multiple studies. 5 This suggests that a larger percentage of sources in our "solid" literature sample may in fact be lower redshift contaminants, but it is a huge open question what that percentage is.
Even median stacking of the SED results is of little use in ascertaining whether sources in our "solid" literature selections are reliable. As we show in Appendix C, very similar stack results are obtained using either the "robust" or "robust"+"solid" subsamples of literature candidates. In both cases, a pronounced ∼1.5-mag spectral break is seen, with no significant flux blueward of the break. Also both stacks reveal a blue spectral slope redward of the break.
Fortunately, an increasing amount of spectroscopy is becoming available for > 8 selections, particularly based on the JADES and CEERS programs (e.g., Tang et al. 2023;Fujimoto et al. 2023;Cameron et al. 2023;Saxena et al. 2023;Bunker et al. 2023), spectroscopically confirming many sources out to a redshift ≈ 9.5 where the strong [OIII] 4959,5007 doublet can be detected at high S/N for star-forming sources and in some cases even earlier (e.g., Bunker et al. 2023). However, it should be noted that not all ≥ 8 candidates are being confirmed to be at ≥ 8. The ∼ 8 candidate 13050 from  has been found to have a redshift = 5.62 and to be an AGN (Kocevski et al. 2023). This is a particularly interesting example since it adds weight to the concern that our photometric redshift SED templates are not yet as complete as we would like. 4 We remark in passing that the  LF results appears to be more consistent than with the empirical completeness estimates we derive on the basis our literature selections (Table 4) than is the case for either the  or  analyses where the completeness of their selections is ∼2× lower than assumed in their LF analyses. 5 For reference, the percentage of ≥ 9 candidates from our "possible" literature selections that occur in more than one study is just 5%. This demonstrates there is really a difference in the quality of the candidates that make up of our literature subsamples.
Regardless of what the actual SFR density is at ≥ 10, i.e., whether it is closer to the "robust" or "solid" literature results shown in Figure 7, essentially all of the present results lie in significant excess of the constant SFE models (Mason et al. 2015;Tacchella et al. 2018;Bouwens et al. 2021;Harikane et al. 2022) by factors of ∼2-6 at ∼ 12 and by even larger factors at > 12.
It seems likely that at least part of the excess at > 9 could be explained due to the impact of noise in driving photometric redshift estimates to somewhat higher values than later found through spectroscopy. The approximate amplitude of this effect appears to be Δ ∼ 1 at > 7 (e.g., Muñoz & Loeb 2008;Bouwens et al. 2022b;Kauffmann et al. 2022;Fujimoto et al. 2023). This appears to be due to typical photometric redshift estimates adopting a flat prior in redshift and thus taking into account the fact that luminous sources are more prevalent at lower redshift than they are at high redshift (Muñoz & Loeb 2008).
There have been a variety of different explanations offered for this deviation from the constant SFE predictions in the literature. One possibility has been to suppose that the mass-to-light ratios of galaxies in the early universe are much lower than at later points in cosmic time, which could result from a change in the effective IMF of galaxies in the > 10 universe to one which is much more top heavy (e.g. Steinhardt et al. 2022;Inayoshi et al. 2022).
Other possibilities have included the hypothesis perhaps AGN contribute much more significant to the light from the earliest generation of galaxies (e.g. , there is much greater scatter in the star formation rates in galaxies in the early universe away from the main star-forming sequence (e.g. Mason et al. 2023), as well a number of other explanations (e.g. Ferrara et al. 2022;Mirocha & Furlanetto 2023;Kannan et al. 2022;Lovell et al. 2023). Ascertaining which of these explanations is correct will ultimately require an extensive amount of follow-up observations with ALMA and JWST, especially involving spectroscopy as e.g. the recent confirmation of a = 9.76 source, > 10 sources by JADES team demonstrates (Roberts-Borsani et al. 2022a;Curtis-Lake et al. 2022;Robertson et al. 2022;Bunker et al. 2023), and = 6-9 sources in CEERS (e.g., Tang et al. 2023;Fujimoto et al. 2023).

SUMMARY
We have derived luminosity functions, and set constraints on the UV luminosity and SFR density from ∼ 8 to ∼ 17, using the three most well-studied JWST NIRCam data sets from the first 5 months of JWST science operations, namely, the SMACS0723 cluster field (Pontoppidan et al. 2022), the GLASS Abell 2744 parallel field (Treu et al. 2022), and four CEERS ) extragalactic fields.
We have selected of samples of ∼ 8, ∼ 10, ∼ 13, and ∼ 17 galaxies in these fields, and made full use of the very extensive selections done by others to date. In particular, we have investigated the challenges of the selection of ≥ 8 galaxies and derivation of LF results from these early JWST NIRCam observations. Even with a very conservative approach to selections, both from our own and similarly sub-selecting those of other studies, we find that luminous galaxies in the first 400-500 Myr are as enigmatic as the first JWST results suggested.
We first make use of two different reductions of the NIRCam observations to test the sensitivity of ≥ 8 selections to the reduction technique. The first set of reductions we utilize relies on the NIRCam pipeline, while the second leverages an alternate set of reductions made with the NIRCam pipeline. Both reductions take advantage of advances made in the calibrations of the NIRCam zeropoints, as well as including steps to minimize the impact of 1/ noise and "snowball" artefacts.
Using sources from the above selection and using estimates of the selection volumes in our search fields, we have derived estimates of the LF at ∼ 8, ∼ 10, ∼ 13, and ∼ 17. While the uncertainties are still very large, our LF results are suggestive of factors of 6 and 6 decreases in the normalization of the LF from ∼ 8 to ∼ 13 and ∼ 17, respectively. Not surprisingly, the results we obtain are similar to the relatively mild evolution in the luminosity density already reported in Naidu et al. (2022b), , , Bouwens et al. (2022a), and.
We also take these results and set constraints on the UV luminosity and SFR density from ∼ 17 to ∼ 8 for galaxies more luminous than −19 mag. Similar to what we found for the LF results, the luminosity density and SFR density, both our direct determinations and the results based on likely robust ∼ 11-13 candidates from the literature, lie significantly in excess of the constant star formation efficiency (SFE) models, by factors of ∼2-6. Interpretation of these results is unclear, and it is open question whether the new results indicate the SFE of galaxies is indeed more efficient or if the IMF of (luminous) galaxies is very different at early times.
As a complement to direct determinations of the LF at ≥ 8, we also derive LF and luminosity density results, by taking advantage of the full samples of ∼ 10, ∼ 13, and ∼ 17 galaxies that have been identified to date over the three most well studied fields. We then segregate this sample of candidates into three different samples "robust," "solid," and "possible" based on how likely sources are to be at > 5.5 based on our photometry of the sources in both our fiducial and secondary reductions of the NIRCam imaging observations.
We first considered the luminosity densities we would derive including all ≥ 8 candidates reported over the three most studied fields to the present. Remarkably, we find 7× and 20× higher luminosity densities at ≥ 12 relying on the "solid" and "possible" candidates than relying on the "robust" candidates from the literature alone. These results demonstrate how uncertain the luminosity densities are at > 6 and how much the results depend on the extent to what lower-redshift sources contaminate the ≥ 8 selections.
Even allowing for a substantial amount of contamination in our selections of "possible" ≥ 8 sources from the literature, large (∼0.5-1.0 dex: factors of 3 to 10) differences exist between the luminosity density results derived from sources graded "robust" and those graded "solid." If the bulk of the ≥ 10 candidates graded "solid" are instead at lower redshift, the true luminosity density results at ≥ 10 would be more along the lines of what has been found by  and , which are consistent with the recent spectroscopic results of Curtis-Lake et al. (2022) and closer to the predictions of the constant SFE models. Some of the recent simulation results from  and Larson et al. (2022) are suggestive of at least modest levels of contamination in the first JWST ≥ 8 selections with NIRCam.
On the other hand, if the bulk of the ≥ 8 candidates graded "solid" are bona-fide, then the LFs and luminosity density at ∼ 10 and ≥ 12 could be up to ∼3× and ∼7× higher and more in the range of the LF results we derive from our own selection of ≥ 8 sources and also more consistent with the results of . Supportive of these high luminosity density results are the median stack results we obtain for our selection of "solid" candidates from the literature, which appear to have almost identical characteristics to what we obtain from a similar stack of "robust" candidates from the literature.
Whatever the reality is, it is clear that huge open questions remain regarding the true LF and luminosity density results at ≥ 8. To resolve these open questions, deeper imaging observations and follow-up spectroscopy with JWST NIRSpec and the grisms will be required, allowing for a significantly improved reliability of ≥ 8 selections and LF determinations going forwards. Fortunately, there are already significant on-going efforts obtaining sensitive imaging over fields like the HUDF (e.g. Bouwens et al. 2022a;Robertson et al. 2022) and sensitive spectroscopic campaigns by the substantial JADES and other programs (e.g. Curtis-Lake et al. 2022;Bunker et al. 2023;Cameron et al. 2023;Tang et al. 2023;Fujimoto et al. 2023) that provide the needed new data.

APPENDIX A: ≥ 8 CANDIDATES IDENTIFIED IN OUR SECONDARY REDUCTIONS
As a test on the sensitivity of ≥ 8 selections to the NIRCam reductions utilized, we perform a second search for ≥ 8 galaxies but using reductions made with ( §2.1). 22 ∼ 8, 13 ∼ 10, 3 ∼ 13, and 1 ∼ 17 galaxies are identified in these reductions.
The coordinates, photometric redshifts, apparent magnitudes, and spectral break amplitudes of the ≥ 8 candidates we find are indicated in Table A1. Also presented is the difference between the minimum 2 found for > 5.5 and < 5.5 fits to the observed SEDs of the sources, the estimated likelihood that a candidate has a redshift in excess of 5.5, and any earlier studies who identified a given source as part of their ≥ 8 searches.

APPENDIX B: ASSESSMENTS OF ≥ 9 CANDIDATES IN THE LITERATURE
Given the considerable uncertainties regarding both the identity and total number of high-quality ≥ 9 candidate galaxies that have been identified to present, we have performed independent photometry on ≥ 8 candidates reported in a large number of manuscripts, as described in §3.3. Use was made both of a reduction of the NIRCam imaging data with the and pipelines. We then categorized these ≥ 9 candidates in three categories "robust," "solid," and "possible." The purpose of this appendix is to present candidates from the literature which we classify as "solid" and "possible." Those classified as "robust" have already been tabulated in Table 3. We place those candidates in Tables B1-B2 and B3-B5, respectively.

APPENDIX C: MEDIAN STACK OF ≥ 9 CANDIDATES WITH VARIOUS QUALITY FLAGS
In evaluating the quality of ≥ 9 candidates from the literature, we place the candidates in three different categories "robust," "solid," and "possible" depending on the relative likelihood we estimate for these sources to lie at < 5.5 or > 5.5 using our own photometry (see §3.3 for how these sets are defined).
In order to interpret each of these designations and determine if the differences are meaningful, we construct median stacks of flux in different passbands and for candidates in different categories. The results are presented in Table C1.
While the median stack results in the "robust" and "solid" categories show no significant flux blueward of the nominal spectral breaks in the ≥ 8 candidates, the median stack in the "possible" category does show tentative 1-1.5 detections in each of bands blueward of the break. Additionally, the median stack results in the "robust" and "solid" category shows larger spectral breaks, i.e., >1.8 mag, than the median stack results in the "possible" category show, where the break only has an amplitude of ∼1-1.5 mag.

APPENDIX D: UV LF AT ∼ 8
It is useful to test of our procedures for deriving the LF at high redshifts using JWST data to ensure we can arrive at reliable results.
To this end, we made use of our ∼ 8 F090W-dropout samples and the same methodology as we use for our ≥ 8 analyses to derive the LFs at ∼ 8. The results are presented in Figure D1 and in Tables 6-7, and it is clear that the results are consistent with what   Table 2) we derived earlier in Bouwens et al. (2021) on the basis of sensitive imaging observations with HST.
As such, we can conclude that our procedures should produce reliable LF results at ≥ 8, assuming we are able to identify significant samples at ≥ 8 which are largely free of contamination from lower-redshift interlopers.

APPENDIX E: STUDY-TO-STUDY SCATTER IN THE ASSESSMENT OF VARIOUS ≥ 9 CANDIDATES
Ascertaining whether individual ≥ 8 candidates are at low or high-redshift can be challenging to answer in specific cases, due to uncertainties in both the photometry of individual sources and the optimal SED templates to utilize in performing the fits.   Table 3 As an illustration of these uncertainties, Figure E1 shows a comparison of the Δ( 2 , >5.5 − 2 , <5.5 ) results obtained using our photometry on the reductions and those obtained using the reductions. As a second illustration of the studyto-study differences, Figure E2 shows a comparison between the Δ( 2 , >5.5 − 2 , <5.5 ) results  and the results we obtain using our reductions. In both comparisons, there is clearly a significant amount of scatter in the assessments that are made about specific candidates. Table B4. Sample of ∼ 10, ∼ 13, and ∼ 17 candidate that we deem to be "possible" using our own photometry on two separate reductions of the NIRCam data. *  ThisWork/Alternate Figure E1. Difference between the minimum 2 achieved with > 5.5 SED fits to specific sources and that obtained with < 5.5 SED fits. Δ 2 values less than −9 or −4 tend to indicate sources are at > 5.5 at high confidence. We note that there is nevertheless a substantial dispersion in the derived Δ 2 values depending on the reductions. If the uncertainties are similar in various literature studies, it could point to there being a significant amount of contamination and incompleteness in existing > 5.5 selections. Table C1. Median stack results for ≥ 8 candidates from the literature which we segregate into the "robust," "solid," and "possible" categories based on our own photometry and SED fits.

ID
Median Flux (nJy) Band "Robust" "Solid" "Possible"   Figure E1 but comparing the source-by-source results of  with those we obtain using our fiducial reductions from . Sources shown in blue are included in our fiducial sample while those shown in green are not. Note that the substantial dispersion in the derived Δ 2 values depending on the analysis.