Content from Why distribute your data through VizieR?


Last updated on 2023-09-14 | Edit this page

Overview

Questions

  • What is VizieR?
  • What is a VizieR catalogue?
  • How is VizieR interoperable?

Objectives

  • Describe the contents of a VizieR catalogue
  • List some VO tools interoperable with VizieR catalogues
  • Outline the benefits of submitting your data to VizieR

What is VizieR?


VizieR provides the most complete library of published astronomical catalogues - tables and associated data - with verified and enriched data, accessible via multiple interfaces. Query tools allow the user to select relevant data tables and to extract and format records matching given criteria. Currently, ~ 24 000 catalogues are available (as in August 2023).

Interactive window: VizieR homepage webpage. Note that you can type in (it is not an image).

VizieR was initially started as a joint effort of CDS (Centre de Données astronomiques de Strasbourg) and ESA-ESRIN (Information Systems Division), and is now fully managed by CDS. VizieR has been available since 1996, and was described in a paper published in 2000 (DOI: 10.1051/aas:2000169).

Note that VizieR does not contain all available online catalogues: some catalogues are not suitable and some less frequently used catalogues have not yet been incorporated into the VizieR database. The full list of catalogues is available from there: Catalogue collection page.

Screenshot: VizieR Catalogue collection page, example for 'mission:gaia', from July 2023
Figure: VizieR Catalogue collection page - example when selecting Gaia mission catalogues (screenshot)

VizieR and more generally speaking the CDS are quite involved in the data preservation aspect, as evidenced by the DSA and CTS certifications awarded over the last years. More details on the topic can be found at the page below.

Interactive window: CDS webpage related to the CoreTrustSeal certification.

What is a VizieR catalogue?


We call a VizieR catalogue:

  • A set of documented tables (including a ReadMe file), linked to a published paper
  • With an indexation by keywords, authors, date, UCDs …
  • With a global indexation by position whenever possible
  • And citation capability (bibcode or DOI)

A catalogue can optionally contains other data types, also associated with publications. Here is a non exhaustive list:

Interactive window: VizieR catalogue landing page, example for Gaia DR3 Part 1 (VizieR name: I/355).

VizieR catalogue and CDS tools


Once a catalogue is ingested in VizieR, one can access a range of tools:

Figure: VizieR and the interoperability with the CDS services. The full list of services is described just above.
Figure: VizieR catalogues accessible through CDS services

Why distribute your data through VizieR?


In addition to the interoperability with the VO tools mentioned above, distributing your data through VizieR means adding values to them.

Summary: Benefits of using VizieR


Key Points

Distribute your data in a trusted repository in accordance with Open Data principles

  • Data preservation: The CDS has been awarded the Core Trust Seal.
  • Discovery: Take advantage of the VizieR services allowing for the discovery, and a good data visibility
    • Dedicated indexations
    • Quality and a good data description
    • Generate persistent identifier (bibcode, DOI) for citation
    • Data are available through the Virtual Observatory (VO) services
    • Data can be queried by various means (simple search, TAP, Python …) and retrieved in different format (ASCII, Fits, VOTable …)

Next chapters


In the next chapters, you will learn what type of data can be submitted, how to prepare and submit them to VizieR, and finally how to search your data using the EOSC tools.

Content from FAIR principles for astronomical data


Last updated on 2023-11-06 | Edit this page

Overview

Questions

  • What are the FAIR principles?
  • What does it mean for astronomical data?
  • Do and don’t when publishing data

Objectives

  • Cite the Findable, Accessible, Interoperable and Reusable (FAIR) principles
  • Create FAIR tables

Overview


The CDS and other astronomical data centers are storing and distributing the astronomical data to promote their usage primarily by professional astronomers.

In order to ensure the scientific quality of the data, we therefore require that the data are related to a publication in a refereed journal, either as tables or catalogues actually published, or as a paper describing the data and their context.

Furthermore, the data should be described accurately enough to allow an unambiguous interpretation (of the data), and facilitate their usability.

In this chapter, we will review the best practices for data publication in the astronomical field.

Figure -- Summary data journey from a publication to VizieR and then EOSC: beginning of the journey - step data published in a refereed paper
Figure: Journey from a publication to EOSC, beginning of the journey - step 1 “peer reviewed datasets”

What are the FAIR principles?


Formalized by Wilkinson et al. in 2016 (DOI: 10.1038/sdata.2016.18), the FAIR Guiding Principles have emerged over the last few years. The underlying idea is to provide a set of guidelines (neither a standard not a specification) for making research data in a broad sense (not only data, but also algorithms, tools, and workflows that led to that data) Findable, Accessible, Interoperable and Reusable, ultimately ensuring standardised machine actionability.

Many great courses can be found online explaining those principles in more details. In this course, we will summarize them and focus more on what it means for astronomical data.

Findable Accessible Interoperable Reusable principles cartoon from 'Open Science Training Handbook'Credit: Open Science Training Handbook, CC0 1.0 Universal, https://doi.org/10.5281/zenodo.1212496

The FAIR Guiding Principles in more depth

The FAIR Guiding Principles for scientific data management and stewardship as outlined by M. Wilkinson et al. (2016).

The first step before reusing the data is to be able to find them. The metadata and data should be easy to find for both humans and computers.

F1. (Meta)data are assigned a globally unique and persistent identifier

F2. Data are described with rich metadata (defined by R1 below)

F3. Metadata clearly and explicitly include the identifier of the data it describes

F4. (Meta)data are registered or indexed in a searchable resource

Once the data can be found, the user needs to know how to access them. That could include authentification and authorisation. The accessibility of the data does not imply that they are fully open to everyone.

The data should be ‘as open as possible, as closed as necessary’, following the H2020 Program Guidelines on FAIR Data.

A1. (Meta)data are retrievable by their identifier using a standardized communications protocol

  • A1.1 The protocol is open, free, and universally implementable
  • A1.2 The protocol allows for an authentication and authorization procedure, where necessary

A2. Metadata are accessible, even when the data are no longer available

Those data can then be combined with different datasets, and interoperate with other applications or workflows for analysis, storage, and processing, thanks to rich metadata.

I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.

I2. (Meta)data use vocabularies that follow FAIR principles

I3. (Meta)data include qualified references to other (meta)data

The ultimate goal of FAIR is to optimise the reuse of data. To achieve this, metadata and data should be well-described, including a clear and accessible license. It should be clear how the data can and cannot be reused, remixed or redistributed.

R1. Meta(data) are richly described with a plurality of accurate and relevant attributes

  • R1.1. (Meta)data are released with a clear and accessible data usage license
  • R1.2. (Meta)data are associated with detailed provenance
  • R1.3. (Meta)data meet domain-relevant community standards

Best Practices for Data Publication in the Astronomical Literature


In 2022, Chen et al. (DOI: 10.3847/1538-4365/ac6268) published a set of guidelines summarizing the best practices for publishing data in astronomy and astrophysics journals. This article is the result of a joint effort from several data centres (NED, CfA, Caltech, CDS, AAS, ADS, among others), as these recommendations are applicable to all those who do data processing.

These recommendations are intended for authors, referees, and science editors to consult in order to avoid various pit-falls that often impede the interpretation of data and metadata by readers, and parsing by software, and therefore also complicate and delay integration of the data into astronomical databases.

Everyone is encouraged to read their paper. For the sake of this course, only their Checklist of Recommendations for Publishing Data in the Literature (Appendix A of their paper) is included below.

Items covered in the Checklist (and also below):

Keep calm and carry on

A lot of items will be covered in this section. We are aware that you could be a bit overwhelmed.

The most important points are highlighted in bold, and some tips when creating tables are given at the end.

Do your best and do not hesitate to contact us if needed: cds-cats(at)unistra.fr.


General rules(Section §2 from article)

  1. Define all symbols, acronyms, and abbreviations at first use.
  2. Provide uncertainty and confidence level when reporting a new measurement.
  3. Present the appropriate number of significant figures for numerical measurements and uncertainties that match the precision of the measurements.
  4. Report the units for measurements if present, and adopt commonly-used ones.
  5. Indicate preferred values if applicable.

Examples

a) “... used by the Dark Energy Survey (DES)...”
b) The period of a periodic phenomenon should be given as 
“P = 1.23456±0.00012 days” instead of “P = 1.23456(12) days”. 
c) Present a measurement as 0.123±0.002, 
not 0.12345±0.002 or 0.123±0.00234.
d) Present source color “(B-V) = 0.45 mag” instead of “(B-V) = 0.45”
e) State your preferred solution in the text and have a dedicated column 
for the adopted solution in your Table, as shown in Table 8 from VizieR catalogue J/ApJS/261/19

Small extract of Table 8 from VizieR catalogue J/ApJS/261/19, displaying 4 columns for the effective temperature: TeffR, TeffS, TeffP and Teff, the latter one corresponding to their adopted temperature
Figure: Extract of Table 8 from VizieR catalogue J/ApJS/261/19


Nomenclature (§2.1)

  1. Provide the complete name for each object. (§2.1.1)
  2. Include the “J” in names based on J2000 coordinates. (§2.1.1)
  3. Insert spacers between a catalogue name and the identifiers within the catalogue. (§2.1.1)
  4. Distinguish between part of an object and the object itself. (§2.1.1)
  5. Do not use the same name for different objects. (§2.1.1)
  6. Always assign a name and verify the name is unique. (§2.1.2)
  7. Keep the appropriate number of significant figures in coordinate-based names. (§2.1.2)
  8. Use established names for known objects and check for the correct formatting. (§2.1.3)
  9. Confirm the names and positions for cross-identifications. (§2.1.4)
  10. Cross-match the same objects in different tables within the same article. (§2.1.4)

Examples

## IAU conventions
a) Use “2MASS J15334944+3759282”, and not “2M 1533+3759”.
b) Use “BR J0529-3526” instead of “BR0529-3526” (which is B1950 by default).
c) Use “B3 2327+391”, and not “B32327+391”.
d) Use “3C 295 cluster” instead of “3C 295” when referring to the cluster.
e) The tau Ceti system now has four planets: e, f, g, and h. 
Since tau Ceti b, c, and d were refuted, the letter designations b, c, 
and d were not reused for the newer planets to avoid confusion.

## New objects
f) Confirm any new acronym to the Dictionary of Nomenclature of 
Celestial Objects.
g) J092712.64+294344.0 indicates a positional accuracy of 0.15 arcsec 
while J092712.644+294344.02 indicates an accuracy of 0.015 arcsec.

## Known objects
h) Creating a fanciful name for an object with an existing designation
is especially discourage.
h) Validate all the identifiers for known objects in their publications 
through Sesame.
A service hosted by CDS that queries NED, SIMBAD, and VizieR to help 
resolve object names.

## Cross-identifications
i) Always verify with established databases that all of the names
given to an object are valid cross-identifications for the object and 
that the listed positions are for the same object.

Quiz

Quiz: Why are these astronomical designations improper?

Q1) J2217+2104

Q2) SN 95B

Q3) HESS J232+202

Q4) BR 0529-3526

Q5) B0008+006

Q6) DEM45

Q7) Gaia 2448177130188111232

Q8) Gaia DR 2 2.7904e18

Q9) mu cep

Look for these objects in Simbad.

Can you find them?

I1) Insufficient precision in RA and DEC causes confusion. In this case, it could correspond to many objects and different surveys.

I2) Incomplete name can be interpreted into different objects.

I3) Leading zero in RA is missing and can cause misinterpretation of the RA at 23 hours instead of 02 hour.

I4) Missing letter J to specify J2000 equatorial coordinates.

I5) Name prefix is needed to distinguish between different objects.

I6) H II regions in LMC or SMC should be indicated with “L” or “S” to avoid ambiguity.

I7) Database objectID number is used without specifying release number. The same running number may refer to a different source in a different release.

I8) ID is written in scientific notation, making it impossible to retrieve the actual object (which is a 19-integer number).

I9) Ambiguous name can be interpreted into different objects.

R1) LAMOST J221750.59+210437.1 and SDSS J221746.82+210424.1 are potential matches for this incomplete name

R2) SN 1895B and SN 1995B are different targets

R3) HESS J0232+202

R4) BR J0529-3526

R5) QSO B0008+006 (Redshift z = 2.3) and IVS B0008+006 (Redshift z = 1.5) are different targets

R6) DEM L 045, or DEM S 045

R7) Gaia DR3 2448177130188111232

R8) Gaia DR2 2790494815860044544

R9) µ Cep (Simbad Identifier: * mu. Cep, 21h43m30.46s, +58d46m48.2s, J2000) and MU Cep (Simbad Identifier: V* MU Cep, 22h23m38.63s, +57d40m50.8s, J2000) are different targets


Astrometry (§2.2)

  1. Provide the best available coordinates.
  2. Specify the celestial reference system and/or frame.
  3. Indicate the equinox and epoch of observation when necessary.
  4. State the wavelength range from which astrometry is obtained.

Examples

a) Complete celestial coordinates are preferred, e.g.,
12h34m56.78s, +12d34m56.7s (Equatorial J2000).
b) Current IAU celestial reference system is ICRS: 
the International Celestial Reference System.
c) Standard equinox and epoch currently in use are J2000.0, 
but it is not always the case. For example, the reference epoch 
for the Gaia Data Release 3 is ICRS at epoch = 2016.0, 
while it is epoch = 2015.5 for Gaia Data Release 2 
and epoch = 2015.0 for Gaia Data Release 1.


Photometry (§2.3)

  1. State the facility, telescope and instrument used.
  2. Describe the method used to estimate photometry.
  3. Use standard passband/filter identifiers.
  4. Clarify the magnitude system.
  5. Specify spectral transitions completely.

Examples

a) Facility ground-based or space-based, specific instrument configuration information.
b) Point spread function fitting, aperture photometry, etc. 
c) Indicate “Johnson B” or “Cousins B” instead of just “B”; 
use “2MASS Ks” instead of just “K”. 
d) Magnitude on the AB, Vega, ST, or some other magnitude system.
e) Carbon monoxide (CO) has several detectable transitions as do ^13^CO 
and C^17^O. The most commonly observed transition is (J=1-0) and each is 
between 110 and 115 GHz. To clearly define a spectral transition, 
one should use, e.g., “CO (J=1-0) ν=115 GHz”. 


Time (§2.4)

  1. Provide the time of observation and exposure time.
  2. Favor full Julian Dates over abbreviated or offset Julian Dates.
  3. Include phase timing measures along with reported periods when relevant.
  4. State when observations from multiple missions are executed simultaneously.

Examples

a) Explicitly described in terms of both the frame of reference
(e.g., JD, BJD, HJD), and the time system used (e.g., UTC, TDB, TAI). 
For example, use “BJD-TDB” to indicate Barycentric Julian Date 
in the Barycentric Dynamical Time standard (preferred).
b) When reporting Julian Dates, the full unmodified date (e.g., 2456789.123) 
is preferred over any offset variation (e.g., 6789.123), to avoid confusion.
Note that MJD (Modified Julian Date) is = Julian Date - 2400000.5, by IAU definition.
c) For a transiting exoplanet orbit where the period is known, 
include a time of transit.
d) If possible, include a graphical representation of the times that 
the missions obtained the data to help visualize where the simultaneity occurs.


Redshift/velocity (§2.5)

  1. Describe the method of redshift measurements (spectroscopic, photometric, etc.) and give references to the model/method.
  2. Specify the reference frame of the redshift measurements (barycentric, heliocentric, galactocentric, etc.).
  3. Provide the frequency/wavelength from which the measurement is obtained.
  4. State whether a published recessional velocity is based on observed frequency or wavelength shifts (i.e., radio or optical convention).
  5. Indicate the quality of the measurement when possible.

Examples

a) Describe particular method (spectroscopic, photometric, Friends-of-Friends, etc.)
and base assumptions used in the models (template fitting, machine learning, etc).
b) Include a clear indication of the reference frame, e.g., heliocentric,
barycentric, Galactocentric, or LSR (Local Standard of Rest).
c) A redshift measured from HI 21 cm emission line may have a significantly 
different systematic velocity than a redshift measured from stellar absorption lines 
in the same galaxy.
d) The radio velocity increment depends upon the rest frequency,
whereas the optical velocity increment depends on the observing frequency.
e) Add a flag indicating low quality as poor seing or 
redshift based on a single spectral line.


Classifications (§2.6)

  1. Utilize established classifications as available.
  2. Define (new) classifications clearly.

Examples

a) For basic morphological types, use well-established schemes 
(e.g., Sandage 2005). 
Authors are encouraged to refer to SIMBAD’s Object Classification
or NED’s galaxy classifications and attributes, which have been standardized 
to enable unified queries across journal articles and catalogues.
b) Explain any abbreviations and/or give a definition of the classification.


Orbital parameters (§2.7)

  1. Avoid using “longitude of periapsis” in place of “argument of periapsis”.
  2. Be explicit about which body’s orbit a longitude or argument of periapsis refers to (e.g., planet or host star).
  3. Include time of periapsis as appropriate.

Examples

a) Only use “longitude of periapsis” when referring to the sum of 
the argument of periapsis and the longitude of the ascending node.
b) The argument of periapsis for a planet or a secondary star’s orbit 
differs from that of the host or primary star’s reflex motion by 180 degrees.
c) When reporting timing for a non-transiting eccentric orbit for 
which argument of periapsis is measured, report time of periapsis in preference to 
(or in addition to) time of inferior conjunction. Both are preferred if possible. 


Tables(Section §3.1 from article)

  1. Provide a clear title and unambiguous labels for columns.
  2. Explain the content of each column, including symbols and flags.
  3. Keep each column homogeneous.
  4. Use the same explicitly defined non-numeric representations for missing (null) values throughout.
  5. Prepare ReadMe files for machine-readable tables.
  6. Give the complete names of the objects (§2.1) in each table, and keep the same names in all the tables and text throughout the article when possible.

Examples

a) Indicate the units for each column when applicable.
b) Make a clear distinction between z the redshift, z the filter or z the metallicity.
c) A single column should not present measurements with different units, 
mix errors with limits or comments, or append flags to values.

d) Use null values that are supported and documented by widely-used 
toolkits, e.g., “NaN” (Not a Number) for floating-point data in Astropy. 
Use the same representation for missing data and have a separate field 
that explains the reasons for a missing value. 
Do not use different representations to indicate the different reasons.

Wrong example: Using 0.00 for “not observed”, and -99.99 for “no detection” in the same column.
Stick to only one NULL value per column!

e) Authors should include a human-readable description of the data,
with at least the column descriptions, units, and references (on the 
origin of the measurements or instruments for observations when relevant) 
in a ReadMe file. More help on the ReadMe is given in the Chapter "Submitting astronomical data".

Quiz

Quiz: How to make this table FAIR?

Object Redshift
Source 1 0.11
Source 2 0.22
Source 3 0.31
Source 4 0.41
Source 5 0.52

Redshift quality flag: 1 = secure, 2 = uncertain

The quality of the measurement is indicated using a superscript. Providing this information using an extra column will make the table more machine readable, and therefore the data more reusable.

Object Redshift Quality
Source 1 0.1 1
Source 2 0.2 2
Source 3 0.3 1
Source 4 0.4 1
Source 5 0.5 2

Redshift quality flag: 1 = secure, 2 = uncertain


Figures (§3.2)

  1. Provide clear caption, legend and axis labels for each figure.
  2. Design the graphics to be accessible.
  3. Make “data behind the plots” publicly available.

Examples

a) Describe in detail what is presented in the figure, what different 
colors, symbols, and lines represent. Units of the axis labels should 
be included when applicable. In practice, figures should be able to 
stand alone without requiring much reading of the main text.
b) Color-blind users would benefit from symbols that vary in shape 
in addition to colors. See the AAS journals’ graphics guide for more advice.
c) Make the original data files used to generate the figures publicly 
available, as this will greatly enhance the ability to reproduce, 
validate, or build upon published results.
Those files will also be harvested by CDS (as shown in the example below).

Interactive window: Example of spectra generated by VizieR based on original data files (“data behind figure”).


Data archiving and access (Section §4 from article)

  1. Append small data sets as part of the publication.
  2. Deposit large or complex data at a long-term archive most appropriate for your data. Adhere to the specific format requirements from the archives.
  3. Provide a complete list of metadata.
  4. Include a Data Availability Statement if required by the journal.
  5. Do not publish data sets at URLs lacking long-term support.
  6. Use unique and informative names for the files instead of duplicating file names and using location in a directory structure as file metadata necessary to uniquely identify a file.

Examples

a) Preserve data as supplementary materials with your final journal 
article, or post the data files with your arXiv preprint.
b) If the data are either too large or too complex to be hosted by the
journal, authors are encouraged to place their data in a trusted 
repository that issues Digital Object Identifiers (DOIs).
c) Visualize the position and orientation of the apertures on imagery, 
key metadata including aperture dimensions, center coordinates, 
and position angle are required.
d) Example from MNRAS.
e) We strongly discourage the publication of URLs to personal web servers 
hosting data sets for which the author or institution has no means to maintain 
for many years after the publication of the associated journal article.
f) if photometry data are available at different bands (e.g., V and R) 
for the same object (e.g., NGC 1275), use names such as NGC1275_V.dat
and NGC1275_R.dat to identify the files. Do not set up separate directories 
for V and R band, and give the same file name NGC1275.dat under both directories.


Literature citations (Section §5.1 from article)

  1. Cite the original references.
  2. Use preferred citations by the authors.
  3. Provide full provenance of the data. Credit the originator of archival data, including the Principal Investigator.
  4. Include all references in the bibliography section.
  5. Distinguish original data in your article and data taken from other work.

Examples

a)  “We adopted a heliocentric redshift of 1.234 (Smith et al. 2012) 
via NED”, where “Smith et al. 2012” is listed correctly in your bibliography.
b) The 2MASS web page requests that you cite the canonical paper by 
Skrutskie et al. (2006), instead of the Explanatory Supplement. 
d) Make sure all appropriate references to papers, software and data 
products are included in a paper’s bibliography section, not just in footnotes.
e) Use phrases such as “This work” to clearly identify original 
data in your article.


Facility credits (§5.2)

  1. Indicate the facilities involved, such as telescopes, instruments, and databases.
  2. Use standard keywords when possible.
  3. Include facility’s own statement if available.

Examples

a) Always describe the facilities or services used, 
and make sure the name is unique.
b) See AAS keyword tags with AASTeX \facility and \facilities. 
c) This research has made use of the VizieR catalogue access tool, CDS, 
Strasbourg, France (DOI: 10.26093/cds/vizier). The original description 
of the VizieR service was published in 2000, A&AS 143, 23.

Quiz

Quiz: Can you guess the names of these facilities?

Q1) ARO

Q2) DDO

Q3) EMIR

Q4) OSIRIS

Names as published Possible interpretation
ARO Astronomical Research Observatory
Arizona Radio Observatory
Abbey Ridge Observatory
Algonquin Radio Observatory
DDO David Dunlap Observatory:0.15m
David Dunlap Observatory:0.5m
David Dunlap Observatory:0.6m
David Dunlap Observatory:1.88m
EMIR Eight MIxer Receiver (on the IRAM 30m radio telescope)
Espectrógrafo Multiobjeto Infra-Rojo (on the Gran Telescopio Canarias)
OSIRIS OH-Suppressing Infra-Red Imaging Spectrograph (on the Keck I telescope)
Ohio State Infrared Imager/Spectrograph (on the SOAR telescope)
Optical System for Imaging and low-Intermediate-Resolution Integrated Spectroscopy (on the Gran Telescopio Canarias)


Software credits (§5.3)

  1. List the software and version used in the production of the article.

Examples

a) Use the preferred citation if available, e.g., the paper describing 
the software. If not, include the name of the author(s),
title of the program/source code, the code version 
and a URL link to the code publisher.


Digital object identifiers - DOI (§5.4)

  1. Use DOIs to cite data sets, software and services if available.

Examples

a) The DOI links should be included in the bibliography to ensure 
proper citation: eg., 
Wenger, M., Ochsenbein, F., Egret, D., et al. 2000, A&AS,
143, 9, doi: 10.1051/aas:2000332


Data content keywords (Section §6 from article)

  1. Tag articles with relevant data content keywords from the UAT (Unified Astronomy Thesaurus).

Examples

a) Tag your articles with UAT keywords that best describe the types 
of data contained in the article.

Summary: Small tips for FAIR tables


Key Points

  • For tables containing known astronomical objects, an existing non-altered name must be given along with the coordinates.
    • Important point: Tables of astronomical objects without coordinates cannot be added into the SIMBAD database.
  • All columns must be explained with their corresponding unit.
  • Your columns should be homogeneous, and you should avoid mixing measurements with different meanings: errors mixed with limits, or values with different units (that should be in different columns). One same unit per column in a table.
  • When there is more than one table, the objects in common must be identified with the same name between tables.

Next chapters


In the next chapters, you will learn what are the different submission routes (depending on your journal) and then how to prepare and submit your data to VizieR. Ultimately you will see how to search your data using the EOSC tools.

Content from Submission routes and journals


Last updated on 2023-11-02 | Edit this page

Overview

Questions

  • Where are the catalogues coming from?
  • What to do with my data knowing my journal?

Objectives

  • Describe the different submission routes per journal

Overview


The submission routes for your data vary depending where you submitted your paper.

2 types of VizieR workflows:

  • Initiated by authors
  • Initiated by the CDS

Interactive plot below: Scroll over to see the number of VizieR catalogues per journal (updated weekly).

Link: Catalogues and files available at CDS

American Astronomical Society Journals (AAS)


The CDS selects tables from published papers which will be ingested in VizieR. Machine readable tables (MRT) are prioritised, but other tables or FITS spectra/images can also be added.

List of journals included:

  • The Astrophysical Journal: ApJ
  • The Astrophysical Journal Supplement Series: ApJS
  • The Astrophysical Journal Letters: ApJL
  • The Astronomical Journal: AJ

Recommendations (from the AAS webpage):

The authors should try to prepare their lenghty tables as machine readable table (MRT) format. Then, VizieR harvests AAS Journal MRTs and makes these data discoverable and searchable via Virtual Observatory protocols, which is another benefit to using this data format.

Reference (from the AAS webpage):

Astronomy & Astrophysics (A&A)


Before publication, A&A editors decide which data will be only available at CDS (in VizieR). The authors then need to submit the requested data to CDS and the data will be available at CDS just after publcation of the paper. Other tables already published can also be selected by the CDS.


Recommendations (from the A&A webpage):

By contract with the Journal, the CDS archives the primary data that are published in A&A and puts them at the disposal of the global community.

Once their papers accepted, the authors submit their MRT files (if possible) directly to the CDS.

References (from the A&A webpage):

Monthly Notices of the Royal Astronomical Society (MNRAS)


CDS selection or direct authors submission depending on CDS acceptation.


Recommendations (from the MNRAS webpage):

Authors are encouraged to upload machine-readable versions of their tables on the VizieR database. Authors should consult the CDS website for instructions on preparing and submitting tabular data, which include a template that can be adapted for MNRAS tables. A hyperlink can be included to CDS from the electronic text of the MNRAS article.

References (from the MNRAS webpage):

Other journals


Any other journals needs the author to submit the data.

Examples of other journals included in VizieR, ordered by their percentage of catalogues ingested in VizieR:

  • Publications of the Astronomical Society of the Pacific: PASP (0.85%)
  • Pis’ma v Astronomicheskii Zhurnal: PAZh / AstL (0.73%)
  • Astronomicheskii Zhurnal: AZh (0.66%)
  • Acta Astronomica: AcA (0.58%)
  • Publications of the Astronomical Society of Japan: PASJ (0.41%)
  • Astronomische Nachrichten: AN (0.30%)
  • Baltic Astronomy: BaltA (0.23%)
  • Nature: Natur (0.10%)
  • Science: Sci (0.08%)
  • Research notes of the AAS: RNAAS (0.008%)

Large datasets / surveys


Very large catalogues are usually directly retrieved by VizieR (from ESA, ESO, NASA, Stsci, …), after selection by CDS astronomers.

Note that the CDS is not a mirror, and a selection on the columns can be done.

Type of big catalogues included (as listed on the VizieR hierarchical organisation webpage):

Category Type of data Example of VizieR catalogue Description
I Astrometric Data I/355 Gaia DR3 Part 1. Main source (Gaia collaboration, 2022)
II Photometric Data II/328 AllWISE Data Release (Cutri+, 2013)
III Spectroscopic Data III/283 RAVE 6th data release (Steinmetz+, 2020)
IV Cross-Identifications IV/39 TESS Input Catalog version 8.2 (TIC v8.2) (Paegert+, 2021)
V Combined Data V/154 Sloan Digital Sky Surveys (SDSS), Release 16 (DR16) (Ahumada+, 2020)
VI Miscellaneous VI/135 All-sky spectrally matched Tycho2 stars (Pickles+, 2010)
VII Non-stellar Objects VII/233 The 2MASS Extended sources (IPAC/UMass, 2003-2006)
VIII Radio and Far-IR data VIII/65 1.4GHz NRAO VLA Sky Survey (NVSS) (Condon+, 1998)
IX High-Energy Data IX/68 XMM-Newton Serendipitous Source Catalogue 4XMM-DR12 (Webb+, 2023)

Summary: Submission per journal


Key Points

2 types of VizieR workflows:

  • Initiated by authors / editors
  • Initiated by the CDS

To face an increasing volume and according to the CDS mission, VizieR selects the articles to be processed.

  • Main journals origin : AAS (ApJ, ApJS, ApJL, AJ), A&A, MNRAS
  • Authors asking for VizieR publication
  • Scientific criteria such as the origin of the measurements (e.g. observations have a higher priority than models)
  • Effort required to make the data reusable (preferably MRT) is taken into account in the selection
  • Special case : A&A editors request authors to submit their data in CDS

Next chapters


In the next chapters, you will learn how to prepare and submit your data to VizieR step-by-step, and then you will follow the journey of your data in the Virtual Observatory and up to EOSC.

Content from Preparing your data


Last updated on 2023-09-11 | Edit this page

Overview

Questions

  • What kind of data can be submitted to VizieR?
  • Which formats are accepted?
  • How can I test my FITS?

Objectives

  • Define the type of data accepted in VizieR
  • Prepare your data, integrating the FAIR principles

Overview


In order to facilitate the usability of the data, and to allow their processing by the data centers, we at CDS require that:

  • the data are described accurately enough to allow an unambiguous interpretation of the data, as well as a comprehension of the context in which the data were acquired and/or processed; a single ascii file, named ReadMe, is designed for this role.
  • the data are in a format which allows their usage by tools currently in usage in our discipline.

The following data types are accepted:

  • Tables
  • Associated data : images, spectra, SED, time series, cube

In this chapter, we will answer to some frequently asked questions about how to prepare the data for their inclusion in VizieR.

A full description of the standard conventions used for the documentation of the catalogues is available from there.

Figure -- Summary data journey from a publication to VizieR and then EOSC: second step of the journey - step preparation of the data, right after - step data published in a refereed paper
Figure: Journey from a publication to EOSC, step 2 “preparation of the data”

Tables


Tables are the main data types accepted. An example of ingested table is shown below.

Screenshot: Example of a VizieR table ingested. The figure shows the first 20 rows for the table 'I/355/gaiadr3'.
Figure: Example of a VizieR table ingested: catalogue Gaia DR3 - ref I/355/gaiadr3 (screenshot)

The following formats are welcome:

Accepted format

In order of preference:

  • Machine-Readable Tables (MRT / FORTRAN format)
  • TSV, CSV: TSV (tab-separated values) or CSV (comma-separated values) files containing a first line with column description is accepted
    • Comments (line beginning by #) are accepted but not transformed into MRT at the submission step
  • ASCII aligned files
    • Original ASCII format used in VizieR
  • FITS table
    • FITS containing only one HDU
  • VOTable
    • Standard XML output defined by the IVOA. VizieR allows to upload VOTable which contain a single table.

Note: zip and gzip compression are allowed. Archives should not contain directories.

The submit web application allows different formats for tables and helps you to ingest your data.

What cannot be used

Postscript or word/excel files.

Files containing different numbers of columns per rows.

Checklist when creating a table

Checklist

  • Present the appropriate number of significant figures for numerical measurements and uncertainties (cf. §2c ).
  • An explanation is given for all the columns (at least a clear label and a short explanation; at best, complete explanation) (cf. §3.1).
  • Units are given for all the columns (when relevant).
  • One column is homogeneous, i.e. does not mix different measurements having different units or errors with limit values or flags.
  • For tables with sources, give:
    • Coordinates as accurate as possible and their epoch (cf. §2.2).
      • A bonus would be to have their uncertainty.
      • Another bonus would be to have their wavelength, and/or the instrument origin.
    • A non-altered name (cf. §2.1)
  • It is really much better to stick to the same identifier from a table to another.
  • If the measurements come from other papers, the references to those papers should be explicitly given using the bibcode.
  • Use the same representation of missing data and have a separate field that explains the different reasons (cf. §3.1d) for missing values if needed.

More details on how to create FAIR tables can be found in the previous Chapter “FAIR principles for astronomical data”.

Following this checklist will help improving the efficiency of the ingestion in VizieR and also the visibility and reusability of your data :).

Associated data: images, spectra, time series, cube


In addition to the Tables, other data type are associated.

Screenshot: Example of spectra ingested and available through the Saada VizieR page, associated with the catalogue 'J/A+A/589/A36'.
Figure: Example of VizieR associated data from the Saada/VizieR page: spectra (screenshot)

Reusable file formats for associated data:

Accepted format

Product type Accepted format
images (2-D data) FITS
spectra, sed (1-D data) FITS, VOTable, ASCII
time series ASCII, FITS
cube FITS

Note: Although multiple formats are accepted, only FITS files will be indexed and accessible through the VizieR associated data service and through the Virtual Observatory. The other formats will still be accessible through the FTP.

What cannot be used

Postscript or word/excel documents.

GIF or JPEG images.

General rules for FITS

In order to index FITS formats as accurately as possible, several basic meta-data are needed, usually readily provided by the telescope software and data processing pipelines such as IRAF and MIDAS, as FITS header keywords.

As a good practice, any FITS header should provide:

  • WCS projections for positions and spectral data
  • Standard FITS keywords described in the NASA FITS description.

List of important keywords to be included:

  • Positions
    • target name
    • coordinates: full WCS (World Coordinate System) is recommended including the projection, the center … (FITS cards: CRVAL, CDELT, CDi_j, CTYPE, CUNIT, CRPIX..)
  • Wavelength(s) of observations
    • full WCS recommended including units (ex: IRAF WAT cards)
  • Observation date
  • Telescope: instrument and Facility

For the sake of clarity and simplicity, we ask the authors, whenever possible, to stick to the “1 item = 1 file” rule, i.e. 1 star spectrum per file, for instance. In this simplest context, fits files should ideally also have only 1 HDU.

File naming conventions


According to ISO 9660 standard, filenames are written as name.extension, with at most 8 characters for name and 3 characters for extension as detailed below:

  • 8 alphanumeric characters, plus the minus sign and the underscore, are allowed ([a-z0-9_-])
    • For files corresponding to published material, the names are consistent with the published paper, and we use tablen.extension to refer to the table numbered n in the published paper, fign.extension for the figure numbered n, etc.
  • Followed by a dot
  • And an extension made of 3 characters related to the format of the files, following the conventions:
    • .dat for data files,
    • .fit for FITS files,
    • .tex for TeX/LaTeX files,
    • and .txt for text files (ascii files containing only printable text).

Full details about the files and directories structures can be found in the Adopted Standards for Catalogues document.

Quiz: Are these filenames correct or not?

Q1) ‘TABLE998.dat’

Q2) ‘table.4.long.version.dat’

Q3) ‘table.dat’

Q4) ‘test+table.dat’

Q5) ‘my_table.data’

Q6) ‘table&data.dat’

Q7) ‘myfile.jpg’

R1) No: filenames should be in lowercase.

R2) No: filename too long. This filename will be shortened afterwards anyway.

R3) Yes: correct. Although not very self-explanatory.

R4) No: ‘+’ character not supported.

R5) No: data files are represented as ‘.dat’.

R6) No: ‘&’ character not supported.

R7) No: File type not accepted.

VizieR FITS validator


You can test your FITS spectra/images with the VizieR FITS validator tool which evaluates the compatibility between your FITS and the asked meta-data.

This meta-data checking is also completed during the VizieR submission service.

Interactive window: VizieR FITS validator tool webpage. Note that you can test your FITS directly (it is not an image).

Summary: Type of data accepted by VizieR


Key Points

Data types accepted:

  • Tables (including at least a ReadMe file)
  • Associated data : images, spectra, SED, time series, cube

Preferred formats:

  • Tables should be preferably written as machine-readable tables (MRT).
  • For the associated data, the FITS format is the preferred solution.

Do not forget to test your FITS data before submission with the VizieR FITS validator tool.

Next chapters


In the next chapters, you will learn how to submit your data to VizieR, and finally how to search your data using the EOSC tools.

Content from Submitting astronomical data


Last updated on 2023-10-31 | Edit this page

Overview

Questions

  • Who need to submit their data?
  • How and where to submit your data?
  • What is a ReadMe file?
  • Do and don’t when submitting data

Objectives

  • Fill a ReadMe file
  • Submit your data in the existing and new Open Science systems, keeping in mind the Virtual Observatory (VO) standards

Overview


In the following Chapter, we detail the data submission part which happens after the acceptance / publication phase.

Not everybody needs to send their data to CDS! We invite you to read carefully the next sections to understand your next steps.

Figure -- Summary data journey from a publication to VizieR and then EOSC: third step of the journey - step submission of the data, right after - step data published in a refereed paper, step preparation of the data
Figure: Journey from a publication to EOSC, step 3 “submission of the data”

Who need to submit their data?


If your article does not contain any tables, you can skip the following!

Otherwise, the following checklist summarizes what to do.

What the authors need to do?

  • Who need to submit their tables to VizieR and when?
    • A&A authors, once your paper is accepted and if the A&A editors say so
  • Which journal will be automatically harvested by VizieR after publication?
    • AAS journals: AJ, ApJ, ApJS
    • MNRAS, Nature, PASJ, PASP and Science
  • My journal is not listed above, but I would like my tables to be available at CDS, what can I do?
    • You can still submit your data to VizieR

In any case, the authors are encouraged to check the integrity of their tables after publication in the corresponding papers. Besides, if an editor asks that some data should be sent to CDS: make sure to send them!

More information on the submission routes per journal are available in the Chapter “Submission routes and journals”.

Where to deposit the data?


Two routes are possible to submit your data to VizieR:

  • Option 1: If not too bulky, use the submission online interface
    • The upload will generate a ReadMe skeleton file based on your tables and perform a few basic checks.
      • You will need to check and complete this ReadMe file.
      • Alternatively, if you already have created a ReadMe file, you can upload it directly along with your tables.
    • In addition to your table, FITS files (spectra or images) for associated data can be uploaded.
  • Option 2: Use the File Transfer Protocol (FTP)

For larger files or any other questions, please contact the VizieR staff: cds-cats(at)unistra.fr .

ReadMe file


The why and what of the ReadMe file

To be reusable, any data needs extra information. In astronomy, the convention is to give this information (authors, title, references, summary, etc…) in a separate file called ReadMe.

Every catalogue in the Virtual Observatory registry has its own ReadMe. Numerous examples can be found on the FTP directories at CDS. A typical illustration could be e.g. J/A+A/382/389/ReadMe.

Here is an example of a ReadMe skeleton:

Reference             Short Title                                (Author+, Year)
================================================================================
Title of the paper
    Authors
   <Reference (200Y)>
   =20YYjjjjjvvvvLppppA
================================================================================
ADC_Keywords:
Keywords:

Abstract:
Description:

Objects:
    -----------------------------------------
       RA   (2000)   DE    Designation(s)
    -----------------------------------------

File Summary:
--------------------------------------------------------------------------------
 FileName Lrecl  Records  Explanations
--------------------------------------------------------------------------------
ReadMe       80        .  This file
table.dat    28      120



Description of file:    [if needed]

See also:

Byte-by-byte Description of file: table.dat
--------------------------------------------------------------------------------
   Bytes Format Units   Label     Explanations
--------------------------------------------------------------------------------
   1-  8   A8    --- col_0 ? [ 0123456CGMN]
  10- 14   F5.2  --- col_1 ?
  16- 21   F6.2  --- col_2 ?
  23- 27   F5.2  --- col_3 ?


Acknowledgements:

References:

================================================================================

How to generate the ReadMe file?

There are two recommended ways to generate your own ReadMe file:

  • With the submission online interface:
    • The upload table process generates a ReadMe skeleton and the standardized tables; both are required for VizieR.
    • This ReadMe file can then be edited or you can upload your own file.
  • With the cdspydreadme Python library:
    • This package builds ReadMe, standardized tables (in ASCII aligned format) or MRT tables from tables which can be in different formats (CSV, votable, FITS, astropy.Tables, or MRT formats).
    • The whole ReadMe can then be tested with the command line tool anafile.
Screenshot: ReadMe Generator Python library Github page, from July 2023
Figure: ReadMe Generator Python library Github page (screenshot)

How to fill the ReadMe file

The Standards for Astronomical Catalogues gives a complete description of the standard for ReadMe files.

In the following we give some tips on how to fill three different sections of the ReadMe file properly. The rest of the ReadMe will be filled by CDS, and thus does not appear below.

The ReadMe file in more depth

General comment: no line in this description file can exceed 80 characters!

It is moreover suggested to limit the textual parts to 70 characters, such that a conversion to FITS could keep the text as COMMENT cards.

  • Title of the paper: Title as written in paper
  • Authors: List of authors or collaboration
  • Reference: If available (bibcode), otherwise the article number
  • Abstract: Describes the scientific results that the author(s) derived from the data.
    • It is simply the one from your refereed paper.
  • Description: Gives the context of the data, such as instrumentation or observing conditions.
    • It should answer the following questions: what, where, when, how.

This section describes the structure of each data files (files with the .dat extension). This description is made in a tabular form, each row describing one field (column) of the data file.

The description is presented as a five-column table with the following elements:

  • the starting (from 1) and ending byte of column, separated by a dash -; this dash is however not required for a single-byte column.
  • the format of the field, written as a fortran-like format
    • An for a character column made of n characters;
    • In for a column containing an integer number of n digits;
    • Fn.d for a column containing a float number of width n digits and up to d digits in the fractional part;
    • En.d for a number using the exponential notation;
  • the units used in the field. SI units are strongly encouraged, avoid the CGS units (for instance, use mW/m2 instead of ergs/s/cm2).
  • the label (heading) of the field, made of a single word (no embedded blank); a few basic conventions are used for usual parameters (e.g. positions) and related quantities (e.g. mean errors).
  • a short explanations of the contents of the column. This last field may also specify: the available range of the value in the column (using […]), the possibility of having unspecified or NULL values (using ?), the order of the values within the table (increasing or decreasing order). More details can be found here.

The CDS can help you to fill a part of the byte-by-byte description (bytes, format, label), but it is crucial that you fill the units and explanations columns.

Data submission (option 1): Online interface


The first option is to use the online interface available at: https://cdsarc.cds.unistra.fr/vizier.submit/.

This web application enables the upload of the data and invites you to fill a ReadMe file generated by the application. The VizieR upload application will create a TEMPORARY repository containing your tables, associated files (spectra, images, cubes) if any, and the ReadMe file.

However, even after completing all steps, your catalogue will not be readily available in VizieR. Additional work by the VizieR team is needed, including checks and homogenization to our standards, before the catalogue is fully ingested.

In the following, we show how to upload your data step-by-step. Additional information on how to use this service can be found on the VizieR catalogue upload (HELP) page.


Start a session

No account creation is needed. One just needs to choose a login for the session, eg. my_unique_id_2023. The login can be reused until the completion of the process but not after submission.

Note - 1: Your login is tight to a given IP address.

Note - 2: Please contact the VizieR staff: cds-cats(at)unistra.fr if a modification to a recently submitted upload is needed.

Interactive window: VizieR catalogue upload webpage. Note that it is not an image, you can submit your files directly from here.


Upload tables

The VizieR submit service detects automatically the input format. Please, upload here only the regular tabular data (tables). More information on the accepted formats can be found in the Chapter “Preparing your data”.

The upload will generate:

  • a ReadMe skeleton including the byte-by-byte based on your uploaded tables
  • ASCII tables with aligned columns (FORTRAN Format / MRT (Machine Readable Table))

Note: Check that all the tables are uploaded at this stage. Adding tables later implies to remove all tables and ReadMe.

Screenshot: VizieR Catalogue upload page, start uploading Tables after initiating a session, from June 2023
Figure: VizieR catalogue upload page - step Tables (screenshot)

Fill the ReadMe

Once the tables uploaded, the next step is to fill the ReadMe file. You can:

  • Edit online
  • Download and complete locally
  • Upload your own ReadMe

Then you submit the updated ReadMe with the “upload” button:

  • This action will check the consistency between the ReadMe and your files.
  • The page is refreshed and a report is displayed. This output is the result of the anafile package - see next section.
  • In case of error, the CDS recommends to review the ReadMe (and remove the ReadMe).
  • But, even if you encountered problems, you can still go to the next step.

Note - 1: The online mode makes regular backups. You can restore the last version and force the backup.

Note - 2: Check that all the tables are uploaded at this stage. Adding tables later implies to remove all tables and ReadMe.

Screenshot: VizieR Catalogue upload page, fill the ReadMe file after initiating a session and uploading your tables, from June 2023
Figure: VizieR catalogue upload page - step Readme (screenshot)

Upload FITS Spectra / Time series

Once the tables and the ReadMe are uploaded, you can upload your associated data: spectra and time series. More information on the accepted formats can be found in the Chapter “Preparing your data”.

The FITS upload generates a mapping between the spectra FITS header and ObsCore keywords. We strongly encourage you to modify the mapping to better describe your data.

Note: Although multiple formats are accepted, only FITS files will be indexed and accessible through the VizieR associated data service and through the Virtual Observatory. The other formats will still be accessible through the FTP.

The uploaded files can be of the following:

  • a single FITS file (possibly compressed)
  • a collection of independent FITS files gathered into an archive (.tar)
    • Independent means that headers are not similar and each FITS needs a dedicated mapping
  • a collection of similar FITS files gathered into an archive (.tar)
    • In that case, we make a unique mapping to describe all resources in the archive (.tar). Be careful, this option is available only if FITS have similar headers.

For each upload, a report is available. It is recommended to verify the mapping by clicking on the report header.

Screenshot: VizieR Catalogue upload page, upload associated data: FITS Spectra and Time Series after initiating a session, uploading your tables and filling the ReadMe file, from June 2023
Figure: VizieR catalogue upload page - step FITS Spectra(screenshot)

Upload FITS images

FITS images can also be uploaded in a similar way as the FITS Spectra (see above).

More information on the accepted formats can be found in the Chapter “Preparing your data”.

Screenshot: VizieR Catalogue upload page, upload associated data: FITS Images after initiating a session, uploading your tables, filling the ReadMe file and uploading FITS Spectra and Time Series, from June 2023
Figure: VizieR catalogue upload page - step FITS images (Screenshot)

Terminate

This is the last step of your catalogue submission.

Please, verify that everything is correct before ending your session. Note that you can go back to the current work using the same login until you have submitted this page with the “Yes” button at the end of the page.

IMPORTANT POINT: A summary of the files uploaded is available via a temporary link, but DO NOT USE this url as it will disappear after publication.

In the last step you can:

  • Upload additional files (keeping in mind the accepted formats)
  • Add a message to the VizieR staff

Providing an email address is required at that stage. The email will be used if additional information is needed.

Note: If a modification to a recently uploaded is needed, please contact the VizieR staff: cds-cats(at)unistra.fr.

Screenshot: VizieR Catalogue upload page, validate your submission after initiating a session, uploading your tables, filling the ReadMe file and uploading FITS Spectra / Time Series and FITS images, from June 2023
Figure: VizieR catalogue upload page - step Validate (Screenshot)

Data submission (option 2): File Transfer Protocol


An other option is to submit your data through File Transfer Protocol. This option is particularly recommended if the ReadMe file was created using the Python cdspyreadme library and for large files.

Since October 2021, FTP uploads requires a login authentification. A web application is available for authors to obtain a temporary login/password: https://cds.unistra.fr/ftp/token/. It includes some FTP instructions.

VizieR FTP token generation webpage. Note that it is not an image, you can already generate a temporary login.


Generate a temporary FTP login

The first step is to register to get a temporary login: available for 3 days.

To do so, the authors need to provide their email address (e.g. jane.doe(at)astro.fr) and choose a login (e.g. paper_carbon_jdoe_jul23). This unique FTP login (6 characters at least) will be dedicated for the upload of a single catalogue.

On success, a mail is sent to your email address with the FTP login and some instructions.


Upload to the CDS repository

Your freshly created account is ready and available for 3 days. You can put your files directly under the repository (the root ftp directory being dedicated per login).

More information on the accepted data and formats can be found in the Chapter “Preparing your data”.

Type the following command in a terminal to access the FTP server: ftp ftp.cds.unistra.fr.

OUTPUT

Connected to ftp.cds.unistra.fr.
220 Welcome to the CDS FTP service.
Name: paper_carbon_jdoe_jul23
331 Please specify the password.
Password:
230-------------------------------------------------------
230-CDS FTP repository for authors
230-
230- => temporary directoy for user paper_carbon_jdoe_jul23
230-
230-Note for binary upload: 
230-tape "bin" in prompt command to switch in binary mode
230-
230-ftp> bin
230-
230-(create Thu Jul  6 12:09:25 2023)
230-------------------------------------------------------
230 Login successful.
Remote system type is UNIX.
Using binary mode to transfer files.
ftp> 

How to upload a table?

To upload a table (e.g. table.dat), use the command: put table.dat.

OUTPUT

ftp> put table.dat
local: table.dat remote: table.dat
200 PORT command successful. Consider using PASV.

Switching to binary mode

In case of Binary data, make sure that your session in in binary mode. This option, is not set by all FTP client - To force binary mode, use the command bin FTP protocol as follow.

OUTPUT

ftp> bin
200 Switching to Binary mode.

Passive mode

By accessing the FTP server in passive modeftp -p ftp.cds.unistra.fr –, one can get the directory listing using ls.

OUTPUT

ftp> ls
227 Entering Passive Mode (130,79,128,14,36,87).
150 Here comes the directory listing.
-rw-r--r--    1 115      120          1570 Jul 06 14:08 ReadMe.txt
-rw-r--r--    1 115      120        259200 Jul 06 14:09 spectra_test.fit
-rw-r--r--    1 115      120             0 Jul 06 14:02 table.dat
226 Directory send OK.

Notify the VizieR team

Once you are done uploading your data (ReadMe, tables and any other associated data), you need to SEND AN E-MAIL to the VizieR team telling them where you have placed files: cds-cats(at)unistra.fr.

Summary: Data submission


Key Points

2 options for data submission:

  • Online interface
  • File Transfer Protocol

The CDS provides tools to build ReadMe file and aligned ASCII tables (Machine Readable Table in FORTRAN format).

  • No need to build it from scratch!

The ReadMe file describe your tables by providing all necessary information to locate the catalogue (authors, title, abstract, keywords, acknowledgments, …).

  • This highly standardised file allows reusability and cross matching between catalogues.
  • A good description of your data is the key to discoverability.

Next chapters


In the next chapters, you will learn what happen to your submitted data before their full ingestion into VizieR and the rest of their journey into the Virtual Observatory and up to EOSC.

Content from Data curation at CDS


Last updated on 2023-11-06 | Edit this page

Overview

Questions

  • What happens to your data after submission to VizieR?
  • What is the data curation?

Objectives

  • Create FAIR tables, integrating “key” columns
  • Summarize all the steps happening behind the curtains once your data are submitted, and before their full integration into the VO

Overview


Once the data have been submitted on the CDS servers, the VizieR team will check that the data are compatible with standards. Once the data have been accepted, the CDS team will also add some valuable and relevant information such as metadata and links to other catalogues. This can lead to interactions with the authors, but we are trying to minimize the level of interaction.

Figure -- Summary data journey from a publication to VizieR and then EOSC: fourth step of the journey - step curation and verification of the data, right after - step data published in a refereed paper, step preparation of the data, step submission of the data
Figure: Journey from a publication to EOSC, step 4 “curation & verification”

Behind the scenes: verifications


In addition to the semi-automated verifications already done by the programs during the different steps of the ingestion, more in-depth verifications are done by the CDS team focusing on the reliability and the quality of the catalogues.

Important points to check

In the following, we present some corrections applied to real datasets.

Here are the 5 important points that would save us some time:

  • Units
  • Parameters description
  • Coordinates
  • Identifiers
  • Common key between tables

Verifications: Example 1 - Units

One key point is to the check the units.

Units corrected

In the example below the original unit for a cylindrical volume of a region (column Size from the figure below) was wrongly set to cm-3.

Screenshot -- Table with wrong units as displayed in paper
Figure: Before – Units as written in original paper (screenshot)

Our team picked it up and wrote to the author and made the description and unit correction (field V from the figure below).

Screenshot -- VizieR table with units corrected
Figure: After – Units corrected in VizieR table (screenshot)

Verifications: Example 2 - Coordinates

After the units, the coordinates are the most important data the VizieR team try to gather and curate. It is indeed the most common way to search for data in VizieR.

Coordinates corrected

Here is an example of coordinates with discrepancies when the declination is at 0 degree.

Screenshot -- Table with wrong coordinates as made available in paper
Figure: Before – Coordinates as written in original paper (screenshot). The following columns are displayed: Seq (catalog index number); BGPS source identifier; Hour, Minute and Second of Right Ascension (J2000); Sign, Degree, Arcminute, Arcsecond of the Declination (J2000).

Once the error detected by our team (missing minus sign for some Declinations), the positions were then updated, two years after the data ingestion in VizieR.

Screenshot -- VizieR table with coordinates corrected
Figure: After – Coordinates corrected in VizieR table (screenshot)

When there are none, positions can be added from other catalogues or from SIMBAD if available. Alternatively, we ask for them (sometimes we have an answer).

Coordinates added

In the following example, we can see that no coordinate was provided in the original table.

Screenshot -- Table without coordinates as available in paper
Figure: Before – Columns as written in original table (screenshot)

Using SIMBAD or otherwise the HASH PN databases (when no SIMBAD corresponding match has been found – SimbadName empty), we were able to complement this table with positions.

Screenshot -- Table with SimbadName and coordinates informations added to the original columns
Figure: After – Coordinates added in VizieR table (screenshot). The 4 columns in color are computed by VizieR, and not part of the original data.

Verifications: Example 3 - Identifiers

The third important thing for our team are the identifiers.

SIMBAD names added + misprint on names corrected

To retrieve coordinates and easy the cross identification between SIMBAD and VizieR, a proper identification is needed.

Here is an example of truncated SDSS names… Impossible to retrieve except by coordinates that we luckily have in this case.

Screenshot -- Table with truncated names as identifiers in paper
Figure: Before – Table as written in the original paper. The following columns are displayed: Type, Name, f_Name; Hour, Minute and Second of Right Ascension (J2000); Sign, Degree, Arcminute, Arcsecond of the Declination (J2000); Teff, log(g), l_[Fe/H], [Fe/H], l_[C/Fe], [C/Fe], l_[C/Fe]c, [C/Fe]c, l_A(C), A(C), l_[Ba/Fe], [Ba/Fe], f_[Ba/Fe], l_[Eu/Fe], [Eu/Fe], f_[Eu/Fe]; Class, Bin, f_Bin, Out, Ref.

So the SimbadName has been added after the process for SIMBAD where misprints on coordinates have been detected (identified by the column f_Name set to o below for Name = SDSS J1137+2553, and highlighted in both figures Before/After). For this object with coordinates pointing to nothing, the right ones have been found thanks to the bibcode given in the table.

Screenshot -- VizieR table with SIMBAD-names added and misprint on names (in the declinaison) corrected
Figure: After – Example of names recognized by SIMBAD added to the original table submitted to VizieR (screenshot)

Verifications: Example 4 - Odd values

We add mimimum and maximum values of numerical columns. It allows us to detect some oddities and it is helpful also for the astronomer who will validate the VizieR catalogue afterwards.

Min-max values added

Screenshot -- VizieR ReadMe file with minimum and maximum values added to the numerical fields
Figure: Example of minimum and maximum values (in brackets) added to a ReadMe file (screenshot)

We also add links between tables in VizieR. For instance, if an author said that magnitudes come from a certain survey, we actually point to that survey so we can verify the values. If a table contains galaxy clusters and another the membership, we can add the number of galaxy members per cluster, assuming the cluster names are the same in both tables.

Adding those links helps us to detect errors and missing data.

Verifications: Example 6 - Missing common key

Last but not least, to add links between tables we need a common key (e.g identifier, coordinates …).

Cross-identification between tables

In the two figures below, we can see an example taken from a paper with two tables (Tables A and B) with two similar first columns in both:

  • Name of the stellar system to which the star belongs
  • Name of the star

However, it is not obvious that Bel10018 (SimbadName: [BFO2002] UMi 10018) mentionned in Table A corresponds to COS 347 in Table B.

Screenshot -- Table A as displayed in paper
Figure: Before – Extract of Table A from paper (screenshot). The following columns are displayed: Name of stellar system, Star name, RA (J2000), DEC (J2000), Bmag, Vmag, Rmag, CoFe, e_CoFe, NiFe, e_NiFe.
Screenshot -- Table B as displayed in paper
Figure: Before – extract of Table B from paper (screenshot). The following columns are displayed: Name of the stellar system, Star name, Reference for HRS abundances, Cr-HRS, e_Cr-HRS, Co-HRS.

As there are no common identifier or coordinates repeated in the second table, the only alternative would have been to go through the list of references cited (3rd column of Table B) to get the coordinates and identify the object one by one. Therefore, the CDS team contacted the author to get the names and positions for Table B and create a better link between the two tables as displayed below.

Screenshot -- Table B updated as available on VizieR
Figure: After – extract of Table B as available in VizieR (screenshot)

Errata


As said before, the VizieR database is evolving every day: with new catalogues being added or old ones being updated.

Tables updated

In the example below, one table from the original catalogue was updated, to reflect the changes published in an erratum.

Screenshot -- Table from catalogue updated to be consistent with erratum publication
Figure: Example of a table updated following erratum publication (screenshot)

Data available to all


Once the data are public, they are accessible as plain files in FTP directories at CDS and other participating data centers (e.g. at CfA/Harvard (USA) or NOAJ/ADAC (Japan)), as well as all VO compatible services.

Figure -- Summary data journey from a publication to VizieR and then EOSC: fourth step of the journey - step curation and verification of the data, right after - step data published in a refereed paper, step preparation of the data, step submission of the data
Figure: Journey from a publication to EOSC, step 4 “curation & verification”

Summary: What happens to your data at CDS?


Key Points

Once the catalogues are submitted, a delay is needed for VizieR curation and validation before full ingestion!

The validation process involves some:

  • Verifications leading to corrections: ~ 30% of the references
  • Main corrections: identifiers, coordinates, units …

You cannot Find, Access and Re-use data if the coordinates/identifiers are not right!

Next chapters


In the next chapters, you will learn what happen to your fully ingested data when they continue their journey into the Virtual Observatory and up to EOSC.

Content from Journey of your data in the VO and EOSC


Last updated on 2023-09-07 | Edit this page

Overview

Questions

  • What happens to your data after their ingestion in VizieR?
  • Why publish to EOSC / VO?
  • How do you find your data in EOSC?

Objectives

  • Describe the integration of astronomy data services in the VO and EOSC
  • Naviguate through the EOSC portal

From VizieR to EOSC


In the previous chapters, we concentrated on the publication of data via the CDS VizieR service. We will now show how the process leads to the data being made visible in the EOSC Portal. We invite astronomy researchers to follow the journey of their data to EOSC!

Once your data have been successfully ingested into VizieR, they begin their journey into the Virtual Observatory, as illustrated in the figure below.

Figure -- Summary data journey from a publication to VizieR and then EOSC: last step of the journey - step integration into the IVOA, EUDAT and finally EOSC, right after - step data published in a refereed paper, step preparation of the data, step submission of the data, steps curation and verification of the data
Figure: Full data journey from a publication to EOSC, step 5 “integration into the IVOA, EUDAT and finally EOSC”

Work within the ESCAPE project and the EOSC Future project has been aimed at connecting the existing astronomy data publishing systems to the European Open Science Cloud (EOSC).

The first stop for your data after VizieR is the International Virtual Observatory Alliance (IVOA) registry. It gathers the data from ViZieR, and other astronomical data centers.

Then the datasets reach a broader audience when EUDAT/B2Find and the EOSC registries gather information (harvest) about everything emitted by the IVOA registry. In this way the familiar processes of publishing data become part of the wider EOSC system.

The European Open Science Cloud (EOSC)


EOSC is the European web of FAIR data and related services for research. It stands for European Open Science Cloud and it aims to make research data easy to find, access, interoperate and reuse (FAIR), making trusted and sustainable research outputs available within and across scientific disciplines.

The EOSC ambition is to federate existing data, research and e-infrastructures to make them all available to European researchers across borders and across disciplines.

EOSC Portal


The EOSC Portal is one of the results of the EOSC Future, European Commission funded project (2019-2023). It provides an interface to EOSC for researchers to utilise the full spectrum of EOSC resources, which include research publications, data, software, and value-added services to support their research.

For prospective users of the services, the Portal provides training materials and tutorials on how to use its features. The Portal also offers information for potential service providers on how to onboard their services to the EOSC Portal Catalogue & Marketplace.

Screenshot: EOSC Portal homepage, from September 2023
Figure: EOSC Portal homepage (screenshot from September 2023)

Case Study: Gaia DR3


In the following, we will show how to find your data in EOSC, using Gaia DR3 (VizieR DOI: 10.26093/cds/vizier.1355) as an example.

The first step is to type the name of your catalogue, dataset, paper, DOI in the Search bar on EOSC portal, as illustrated in the figure below.

Screenshot: EOSC Portal search for 'Gaia DR3' using homepage, from September 2023
Figure: Searching for ‘Gaia DR3’ on EOSC Portal – search from homepage (screenshot from September 2023)

You can also browse directly the Marketplace, from the same website. You will get the same results in the end.

Screenshot: EOSC Portal search for 'Gaia DR3' using 'Browse Marketplace' page, from September 2023
Figure: Searching for ‘Gaia DR3’ on EOSC Portal – Browse Marketplace tab (screenshot from September 2023)


Step 2: Scan list of records

A full list of search results appear (link to all results), as illustrated in the figure below. You can filter the results using the ‘Filters’ options on the left, or also by type of data at the top (eg. Catalogs, Publications, Data, Software …).

In our example (Gaia DR3), we can see that the search is done on multiple fields: Title, Author names, abstract.

Screenshot: EOSC Portal search for 'Gaia DR3' -- scanning through all the records, from September 2023
Figure: Searching for ‘Gaia DR3’ on EOSC Portal – all records (screenshot from September 2023)

In addition to the filters available on the left side of the EOSC Marketplace results page, more advanced filtering can be done using the ‘Show Advanced Search’ button below the main Search bar.

The search can be narrowed by ‘author’, ‘exact phrase’, ‘in title’, ‘none of’, as illustrated in the Figure below.

Screenshot: EOSC Portal search for 'Gaia DR3' -- scanning through all the records and filtering using the 'advanced search' option, from September 2023
Figure: Searching for ‘Gaia DR3’ on EOSC Portal – all records, with extra filtering (screenshot from September 2023)


Step 3: Select an entry

Once you found the relevant entry, you can click on it (link to selected result) and see many informations: Title, DOI, Date of publication, Publisher, Abstract, Relevant subjects (keywords), as illustrated in the figure below.

In this example, you can access to the resource by either clicking on the DOI link, or on the ‘Download from’ on the right side of the EOSC page, and then select one of the two data sources available: B2FIND or Strasbourg Astronomical Data Center.

Screenshot: EOSC Portal search for 'Gaia DR3', from September 2023
Figure: Searching for ‘Gaia DR3’ on EOSC Portal – result (screenshot from September 2023)

Examples: try it yourself


Can you answer the following questions?

Q1) What is the publication date for ‘Brightest cluster galaxies in Abell clusters’?

Q2) Which CDS service is listed in the ‘Services’ type of results?

Q3) Can you list some ‘cds-astro’ Softwares available?

Q4) Advanced question: can you find the dataset associated with the ‘National Radio Astronomy Observatory’?

Q5) If you published a paper or a dataset, type your name in the Search bar. What do you get as an output?

Browse through EOSC Portal to answer the previous questions.

R1) This paper was published on 01 Jan 2017.

R2) SIMBAD is the service accessible through EOSC portal.

R3) Examples of softwares available: cds-astro/tutorials: v1.0.0, cds-astro/aladin-lite: Aladin Lite v3.1.1, cds-astro/cds-moc-rust: Release v0.5.2, cds-astro/mocpy: Release v0.12.3

R4) Leave the Search bar empty, and type ‘National Radio Astronomy Observatory’ as the author in Advanced Search. Launch the search by clicking on the ‘Add’ button.

The search result Data should look like: search-result.

Summary


Key Points

  • EOSC is not a new digital infrastructure.

  • EOSC will support the cultural change towards Open Science and FAIR principles in the European countries and institutions.

  • Individual researchers will benefits from EOSC through their existing channels (e.g. universities, research institutes, research infrastructures, associations, science clusters, etc.) that will act as intermediaries.

  • EOSC will allow for universal access to data and a new level playing field for European researchers.

  • EOSC Portal Catalogue & Marketplace acts as an entry point to a multitude of services and resources for researchers, such as:

    • Discover Research Outputs (datasets, scientific publications, softwares)
    • Access Training Materials (lessons, courses, videos)
  • The catalogues ingested in VizieR are findable and accessible in EOSC!

  • Interdisciplinary science is facilitated using EOSC Portal, thanks to the interoperability of the data.

  • EOSC promotes the reuse of the data.

End!


This concludes this training on how to make your published astronomical data (table, images, spectra, …) FAIR and openly accessible to the community, and discoverable in Virtual Observatory tools such as EOSC.