PRIMER  4. files                                          file: 04-files
[Preliminary version]           DIRDIF-2007                 28 July 2007

------------------------------------------------------------------------
PRIMER
Section 4.     DIRDIF file definitions

 
Filenames are dependent on the computer and on local use. The different
files of the DIRDIF system are referred to by its functional type. (The
filename dictates the contents and the format of the file.) Within the
FORTRAN programs and in all documents (also in this PRIMER), filenames
are represented by capital letters. They may locally be transcribed to
lower case, and maybe concatenated by compound code or directory name
(or otherwise changed to local conventions).
Example: for the test compound MONOS the primary crystal data is given
in the CRYSIN file. For the PC-version of DIRDIF the filename remains
CRYSIN. For the VAX-VMS version the filename is CRYSIN.DAT . For all
unix systems the same file is called monos.crysin . Etcetera.
 
Input data:
DIRDIF needs an input reflection data file. 
The primary crystal data may be supplied manually, but it is preferred
to prepare the CRYSIN file in advance, using local conversion programs. 
For some options atomic parameters files are needed: 
for ORIENT: the model parameters => ATMOD,  
for TRACOR or PHASEX: parameters of the partial structure => ATOMS.
 
Standard file structure:  Most files consist of free-format records of
at most 72 characters each. The order of words (literals, numbers) in a
record usually is fixed. 
-  The first record is a header record with at least FILENAME and CCODE. 
-  The first word of a record is a keyword for identification. 
-  REMARK records (keyword=REMARK) may be inserted anytime. 
-  The last record is an END or a FINISH record.
Note: reflection files have fixed format; REMARK records are not
permitted.
 
 
3a.   Listing files LIS1 and LIS2
 
The system produces a file for printing (LIS1 = printable output) which
gives the most important information on the solution of the structure.
In addition a longer listing file, LIS2, is produced which gives
information on the input data, the execution of the various programs,
and their results. Inspect the file LIS2 only if you are interested or
when the structure did not come out as you hoped or expected. With the
aid of the detailed information you might be able to detect where things
went wrong, then change input data and start DIRDIF again. Certainly
LIS2 should not be printed routinely. But if things really go wrong, do
send the LIS1 and LIS2 prints (files) to Nijmegen: we will be glad to
help you!             
Note: before next run is executed, the LIS1,-2 are copied to LIS1X,-2X .
 
 
3b.   Atomic parameter files ATOMS and ATMOD
 
The input and output atomic parameter files of the DIRDIF system are:
- ATOMS file: input to most programs, overwritten with output
parameters,
- ATMOD file with the model parameters input to the program ORIENT,
- ATOLD file: a collection of parameter sets, to be used as back-up
file,
- XYZN, SPF, SCHAKAL: for communication with other program systems.
      (For instance: when XYZN is renamed to INS, the file is ready for
      input to the SHELXL least-squares refinement program.)
 
The ATOMS file consists of the following records, each containing a
keyword followed by data:
 
ATOMS    CCODE    more-info       (CCODE = compound code)
ATOM     atomname   x  y  z       (x,y,z: fractional atomic coordinates)
                                  (one atom / record, as many as needed)
REMARK   comments                 (optional, as many as desirable)
END                               (last record)
 
The atomname begins with the chemical symbol and may be followed by one
or more characters (e.g. C7, C+7, C7+, C7A are carbon atoms; CA is a
calcium atom, CX is an error). Alternatively the atomname may consist of
the chemical symbol, one or more blanks, and one unsigned integer number
( e.g. C  27 ). Uninterpreted (residual) peaks of a Fourier map are
given atomname = Q .
 
It is possible to supply a site occupancy factor sof (sof = 1.00 also
for atoms on special positions; sof < 1.00 for disordered atoms) and an
isotropic temperature factor (B) on the ATOM record, but do so only if
you are sure about the data, because it will have a significant effect
on the scaling procedure.
 
When the structure has been solved the output ATOM records are provided
with a site occupancy factor (sof = 1.00) and an isotropic temperature
factor (B):
ATOM        atomname   x  y  z   sof   B
 
At the end of a structure solving run, the program NUTS/AT2X converts
the output ATOMS file to an XYZN file (equivalent to the INS/RES files
of the least-squares refinement program SHELXL) and (optional) to SPF
and SCHAKAL files (input to graphics programs PLUTON and SCHAKAL,
respectively).
  
Note: the ATOMS file may contain more atoms sets, after the closing
      'END' may follow another 'ATOMS' record, etc., but in most cases
      only the first atoms set is used.

 
The ATMOD file has the same structure as the ATOMS file.
 
Possible header records are:
ATMOD    MCODE    more-info                 (MCODE = Model code)
ATMOD    MNUM  MCODE                        (MNUM = Model number)
ATMOD    MCODE    MCELL a b c alpha beta gamma
ATMOD    CART
ATMOD    MCODE CART MNUM
ATOMS    CCODE                             (using cell of present CCODE)
 
ATOM records with atomic parameters  (one atom / record, as many records
as needed) contain either
     fractional coordinates:             ATOM   atomname    x  y  z
     or Cartesian coordinates:           ATOM   atomname    X  Y  Z
For the atomname see under ATOMS file.
 
REMARK records can be inserted (after the header) whenever needed.
END is the last record.
 
Notes.
The information CART (for Cartesian) is optional as DIRDIF finds out
whether the parameters are fractional or Cartesian. The information
'MCELL a b c alpha beta gamma' is necessary only when the fractional
atomic parameters of the model or fragment are represented in a unit
cell that is different from the present compound CCODE. (In stead of
'MCELL' also 'CELL' is accepted.) In an interactive session the MCELL
data can also be provided at the terminal.
 
Atomic parameters of a known molecular model can be retrieved from the
DIRDIF-ORBASE fragment file at an interactive terminal session. For
larger structures these fragments may be too small. The Vector Search
method can often be employed more powerful if you retrieve molecular
models from your own solved structures, or from the literature, or by
molecular modelling. It is convenient to prepare an ATMOD file in
advance, and modify the model (delete, rename, and add atoms)
interactively.
 
The ATMOD file, described so far, is input (e.g. by instruction: DIRDIF
CCODE ORBASE), and after checking, editing, and possible re-orientation,
a new ATMOD file is output with Cartesian coordinates (the original
input file is saved in the ATOLD file for back-up).

Note: the ATOMOD file may contain more atoms sets, after the closing
      'END' may follow another 'ATMOD' record, etc., and each atmod set
      is used by ORIENT.
      Such a multiple ATMOD file may be dreated by program ORFLEX.


3c.   Crystal data files CRYSIN and CRYSDA
 
CRYSIN: primary crystal data: standard DIRDIF input file.
INS or RES: SHELXL control data files (contains the HKLF record, see 3d)
CIF   : IUCr-ActaCryst CIF data file for crystal data only
CRYSDA: extended crystal data, generated by subroutine CRYSDA
 
The program CRYSDA (usually called automatically) reads crystal data
from a CRYSIN file (highest priority) and/or from other input
possibilities (existing CRYSDA, INS/RES, CIF, keyboard) and produces a
CRYSDA file which contains the input crystal data and extended data such
as cell volume, calculated density, tables of scattering factors, etc.
If no CRYSIN file was available, or if the data in the CRYSIN file was
incomplete, or if the crystal data was modified interactively, a (new)
CRYSIN file will be output. The CRYSIN files is to be kept. Normally,
the CRYSDA file is deleted at the end of the job.
 

The CRYSIN file contains the following records:
 
CRYSIN   CCODE    more-info        (header)
TITLE    any user supplied information     (to be printed)
CELL     a b c alpha beta gamma            (Angstrom, degree)
CELLSD   esd's                             (six numbers)
SPGR     e.g. P 1 or P 21 21 21 or R -3    (axial directions are
                                           ( separated by blank(s))
FORMUL   At1 Nr1 At2 Nr2 At3 Nr3 ......    (Ati=chem.symbol,   Nri=nr of
                                           ( atoms Ati ,  max. 10 kinds)
         At7 Nr7 At8 Nr8 At9 Nr9 ......    (continuation record allowed)
                                           Example:    for  Na2CO3.7H2O:
                                           FORMUL NA 2 C 1 O 3  H 14 O 7
Z        number of FORMUL units / cell
                                      (Note: cell contents = Z * FORMUL)
                                        (! Z is not a symmetry factor !)
WAVE     Cu or Mo or Fe or Ag or Cr         (one atom type; no number)
ORIN     crystal orientation matrix         (OPTIONAL,  3 records)
END
 
Notes:
- When during the crystal structure analysis you wish to alter the cell
contents or the space group, you should do this in the CRYSIN file.
 
 

3d.   Reflection data files
 
Input (formatted): FREF alias FREFA FREFB FEFC or  HKL alias SHELX
SHELXL
Output (binary): BINFO   (to be kept for next runs)
 
The subroutine MERBIN finds out which input data file is present, it
reads the reflection data and writes a temporary binary reflection data 
file BINFO. 
Formats of the reflection data files:
 
FREF alias FREFA FREFB FREFC: formatted reflection data file,
                               28 characters/record
                               (standard DIRDIF file)  with  Fobs values
      first record:      header with 'FREF' or 'FREFA' ... and CCODE
      following records: 1 reflection each, FORMAT (A1,3I3,I2,F9.2,F7.2)
                              for:  ' ', h, k, l, JC, Fobs, sigma
                              JC=2 for 'unobserved' or 'unreliable',
                              else JC=1 or blank
      last record:       'E'
 
 
HKL alias SHELX SHELXL:  formatted reflection data file,
                         28 characters/record
                         with |Fobs| or |Fobs|**2 values
                              (defined by a HKLF record: no default!)
      First record:      HKLF header (optional, not SHELXL convention)
         First word:     'HKLF' on columns 1 - 4
         Second word:    the CCODE (optional, not checked)
         Then:           one number, either 3 or -3 : |Fobs| expected, 
                                     or  4 or -4 : |Fobs|**2 expected !
      Following records: 1 reflection/record,    FORMAT (3I4, 2F8.2)
                              for:  h, k, l, |Fobs|,    sigma
                              or:   h, k, l, |Fobs|**2, sigma
                (Note: the SHELXL batch number on cc. 29-32 is ignored.)
      Last record:       h = k = l = 0 (or: all blanks)
 
      (Note about the SHELXL indices transformation matrix  Rij given on
      the HKLF record: This feature is available,
      but should be used with care !! It is not used on crystal data!)
 
Mind that a CIF file or an INS/RES file (SHELXL) can only be used
for crystal data input, not for reflection data input.
 
 
3e.   DDLOG file ('readable data')
 
This file contains a summary of DIRDIF runs with pertinent data. This
file is to be kept.
 
 
3f.   ORBASE and ORUSER files
 
ORBASE : a data base with molecular fragments.
ORUSER : a private extension of ORBASE  (with your own favourite models)
A write-up of these files is given in the header lines of these files.
The user is urged to add (manually) his own structural molecular
fragments to the file ORUSER for future use when solving 'similar'
compounds.
 
------------------------------------------------------------------------
