User:Sreed16/sandbox

File format conversion between SEG-D, SEG-Y, and SU in Seismic Unix is an important process when working with the Seismic Unix seismic processing software package supported by the Center for Wave Phenomena (CWP) at the Colorado School of Mines (CSM).



Introduction
File formatting in seismic recording is an integral facet of reflection seismology that involves the recording and storing of data during seismic studies for later processing on the computer. Typically the file is broken down into the header block and data blocks, with the header block recording information about the data collected while the data blocks hold the actual seismic information collected. Methods of recording and processing these sets of data have seen a continuous evolution alongside the development of computers. This has caused the need for data set standards to be developed to allow the collaboration between different companies and across different regions. As time has progressed there has been a need to update the file formats with new increases in data size and scope. New standards have also been developed as new techniques have arisen allowing for more efficient data storage and manipulation. In 1966 the Society of Exploration Geophysicists (SEG) created the Digital Recordings Standards Committee to address these issues, which would later evolve into the Technical Standards Committee. Two of the file formats that have been developed and seen widespread use from the SEG are the so called SEG-Y and SEG-D. Another commonly encountered derivative of SEG-Y is the SU file formats from the popular Seismic Unix freeware program developed by the Center for Wave Phenomena at the Colorado School of Mines. SU is very similar to the SEG-Y file format minus the header geometry.

History of seismic file formatting

 * 1966-The Committee on Digital Recording Standards of SEG formed to recommend standard 9-channel field tape format; lead to Format A and Format B being suggested as possible answers to standardization question in 1967
 * 1971 the need to address new instrument developments was recognized; led to adaptation for instantaneous floating point amplifiers and 1600 bpi recording density
 * 1972 subcommittee on Field Tape Formats gave new IBM compatible format with floating point, full word designation named Format C
 * 1973 SEG committee on Technical Standards rolled out SEG Exchange Tape Format “SEG Ex”
 * 1975 Increasing demands lead to the expansion of the SEG Ex with the introduction of the SEG-Y Format
 * 1975 new advances again demanded more flexibility in formatting: Format D proposed as new option, not to replace A,B,C, or Y
 * 1994 short term revisions made to SEG-D for revision 1
 * 1996 SEG-D streamlines to allow more efficiency with higher density media for revision 2
 * 2002 SEG-Y revision 1 was rolled out to handle 3-D data acquisition
 * 2006 SEG-D revision 2.1 made to handle a decade of computer advances. SEG-D was revised for the time being with acknowledgement that SEG-E is likely on the horizon
 * 2012 SEG-D 3.0 was a major revision designed to fix errors found in previous iteration given new file demands, though it was not fully 100% backward compatible

SEGY


A SEG-Y file may be written to any medium that supports variable length records. It should be noted that special rules need to be followed when writing to CD-ROM.[2002 SEG-Y ref] The SEG-Y information layout consists of four parts, a tape label, textual file headers, binary file headers, and trace data. SEG-Y can support complement integers in the headers and integers or floating point in the trace data. Varying trace lengths and same length traces can be taken under the SEG-Y format.

The first 3200 bytes are the textual file header which contains forty lines of readable text. The header should contain information about the seismic data being recorded; however, it is free form, but the first twenty lines do have a suggested format that include information about the date, company, equipment, sampling interval, and pattern. The optional extended textual file header is a separate textual header with a more defined structure.

The binary file header is 400 bytes and contains binary values defined as 2 or 4 byte two’s complement integers that are needed in processing trace data. It holds values including ID numbers, number of traces per ensemble, number of samples per trace, sample interval, data sample format code, etc... .

The existence of an extended textual file header is controlled by information in the textual file header and the number of extended textual file headers is user defined. The information layout is more defined than the textual header file and should include information about navigation, 3-D bin grids, processing history, and acquisition parameters.

Trace headers contain trace attributes in 2 or 4 byte complement integers. Trace headers aren’t meant to be a storage for significant amounts of ancillary data, but should include values needed for processing, shot-point locations, ensemble coordinates, and measurement units. Trace lengths themselves are allowed to vary

The trace data following each trace header is organized into ensembles of traces or series of stacked traces. Data can be formatted in two’s complement integers, IEEE floating point, and hexadecimal exponent.

SEGD



 * General Header #1- File Number, Format Code, General Constants, Time/Date, Manfacture’s Code, Base Scan Interval, Polarity, Record Type/Length, Scan Types/Record, Channel Sets/Scan Type, Skew Blocks, Extender Header Blocks, External Header Blocks
 * General Header #2- Expanded File Number, Extended Channel Sets/Scan Type, Extended Header Blocks, Extended Skew Blocks, SEG-D rev number, General Trailer, Number of Blocks, Extended Record Length, Dominant Sampling Interval, External Header Blocks
 * General Header #3- Time Zero, Record Size, Data Size, Header Size, Extended Recording Mode, Rel Time Mode
 * *General Header #4- Abbreviated Vessel or Crew Name, Vessel or Crew Name
 * *General Header #5- Survey Area Name
 * *General Header #6- Client Identification
 * *General Header #7- Abbreviated Job ID, Job ID
 * *General Header #8- Line Abbreviation, Line ID
 * *Source Description Block- Source ID and specifications (Vibrator, Explosive, Airgun Watergun, Electromagnetic, Other Source)
 * *Additional Source Info- Time, Source Status, Source ID, Source Moving, Error Description
 * *Source Auxiliary Channel Reference- Source ID, Scan Type Number, Channel Set Number, Trace Number
 * *Coordinate Reference System - Coordinate Reference System (CRS) ID
 * *Position Blocks – Time of position, Time of Measurement, Error, Position Type
 * *Relative Position Block – Offset easting, Offset northing, Offset vertical, Description
 * Scan Type Header- BCD value, MSB value, Scan Type number, Channel Set number, Channel Type, Channel Set Start/End time, Number of samples, Descale Multiplier, Number of channels, sampling interval, Array forming, Trace Header extension, Extended Header Flag/Channel Gain, Vertical Stack, Streamer Number, Alias Filter Frequency/slope, Low Cut filter/slope, First,second, thrird, notch filter, filter phase, Physical Unit, Filter Delay, Description
 * Demux Trace Header- File Number, Scan Type Number, Channel Set number, Trace number, First Timing Word, Trace Header Extension, Sample Skew, Trace Edit, Time Break Window, Extended Channel Set Number, Extended File Number
 * Trace Header Extension #1- Receiver Line number, Receiver Point number, Reshoot Index, Group Index, Depth Index, Extended Line Number, Extended Receiver point number, Sensor Type, Extended Trace Number, Number of Samples per Trace, Sensor Moving, Undefined Section, Physical Unit
 * *Sensor Info Header Extension- Instrument Test Time, Sensor Sensitivity, Instr Test Results, Serial Number
 * *Timestamp Header- Time Zero of Data, Undefined Section
 * *Sensor Calibration Header- Frequency, Amplitude, Phase, Calibration, Undefined section
 * *Time Drift Header- Time of deployment, Time of retrieval, Timer Offset Deployment, Timedrift corrected, Correction method, Undefined Section
 * *Orientation Header – Rotation X,Y,Z axis, Reference Orientation, Time Stamp, Orientation Type, Reference Orientation Validation, Rotation Applied
 * *Measurement Block- Timestamp, Measurement Value, Max/Min Value, Quantity Class, Unit, Measurement Description, Undefined Section
 * *Electromagnetic SRC/RECV DESC Block- Equipment Dimension X,Y,Z, Equipment offset X, Y,Z, Undefined Section
 * *General Trailer Description Block- BCD Value, MSB value, ASCII or binary, Block Size, Description
 * *Optional

SEG-D is a seismic format that consists of three different parts stored consecutively, a Record Header, Trace Data, and Record Trailer.

The storage unit label makes up the first 128 bytes of the tape made up by: the storage unit sequence number, SEG-D revision, fixed or variable storage unit structure, binding edition, maximum block size, API producer organization code, creation date, serial number, reserved, storage set identifier, external label name recording entity name, user defined, and max shot records per field record.

The headers are blocks of data preceding seismic data that contain information about the seismic data including acquisition parameters, geometry, and other user defined information. Header blocks contain at least three general headers, scan type headers and optional extended and external headers.

General headers are 32 bytes long and contain basic information including file number, time, number of channel sets, and sizes of data. There must be at least three general header blocks, additional general header blocks may be used if more information is needed to be entered. The 32nd bite of all headers blocks contain an ID which is assigned from the information it contains.

Scan type headers are 96 bytes and describe information pertaining to recorded channels. The information includes filters, sampling intervals, and sample skew. A channel set is a group of channels that must have identical recording parameters, identical processing parameters, same streamer cable origin, and same group spacing. Scan type headers must be in the same order as their channels sets. There can be more than one scan type to accommodate for dynamic changes in channel numbers and time intervals.

Sample skew headers follow the scan type headers and represent a fraction of the base scan interval. Sample skews are recorded in single bytes for each sample of each subscan of each channel set and have a resolution of 1/256 of the base scan. The Demux trace headers and their trace data appear as one block of data, the header is an identifier that precedes each channel’s data at a length of 20 bytes. The header gives information from the general header and scan type header describing the trace along with timing words, sample skews, and integrity checks Following the demux trace headers may be a trace header extension to include receiver location, and other user defined headers containing sensor, source, position, and orientation information.

The optional headers include the extended header, external header, and general trailer. The extended header contains allows equipment manufacturers to store information on the equipment and processes. The external header is an additional resource for the user to store additional information particular to their specifications. The general trailer stores auxiliary system and navigation data.

Data is recorded as a stream of bytes in demultiplexed format, forming a trace stemming from one channel in one channel set. The data is a representation of the sign and magnitude of the instantaneous voltage presented to the system. The data’s sample representation can be formatted in 8, 16, 20, 24, 32, or 64 bits and can utilize different recording methods out of in binary exponent, quaternary exponent, and hexadecimal exponent, IEEE floating point and integer values. Each exponent system and sample representation work in similar ways by using sign bits (S), quaternary exponents (C), and fraction (Qn) to result in an input signal. The input signal given is by S.QQQQ, Q….X (2,4,16) CC…. X Descale Multiplier (DSM). The floating point recording system also uses S, C, and Qn; however the input signal is given by v X DSM where v is dependent on C and Q. The integer value recording system only uses integers and gives the input signal by (IIII, II… +2N-1) mod 2N - 2 N-1) X DSM volts. The DSM is a parameter, stored in the scan type header, which allows dimensionless numbers recorded on tape to be descaled back to millivolts at the input. Sensor calibration values are also used to calibrate frequency and time domains, these values are stored in the trace header extension blocks and calibration channel set respectively.

SU
In Seismic Unix data is stored and manipulated in the SU file format. This file format is based off the SEG-Y format trace data. Instead of having full header data, SU files consist only of the actual seismic trace data, which are in the form of native binary floats. Ebidic and binary reel headers are not preserved in the SU format and steps must be taken to convert between SEG-Y and SU files depending on the program being used for processing.

Conversion between SEGY and SU
When using seismic unix it is necessary to convert the SEG-Y format to SU. Seismic Unix can not directly read into the SEG-Y format. The segyread program is used to read into SEG-Y with the following syntax: It is important to be aware of the binary format for the machine you are using, whether it is big-endian or little-endian and also what type of unix your tape drive is.

An additional step that sometimes needs to be taken in Seismic Unix is to clear the optional header fields that are part of bytes 181-240. This section of the header is not governed by a particular format and often is adjusted to suit custom needs. To avoid confusion in Seismic Unix it can be necessary to zero out this header section with the segyclean program with the following extra syntax: To take seismic data stored in Seismic Unix and convert it back to SEG-Y requires the addition of data back to the headers that were stripped in the SU format. The first step is to replace the ascii and binary headers. This is acomplished through the segyhdrs program: This will create two files labeled "header" and "binary" in the current working directory. Options to adjust the binary header information includes: The file "header" is already in the ASCII format and can be manipulated through a simple text editor as long as the format is 40 lines by 80 characters. The binary header will still be needed to be converted to ASCII to be edited. This can be accomplished through: Producing a header file as follows:

Header values can be edited and assigned back into the with the following command:

Individual header values may also be edited such as: or

Conversion between SEGD and SU
SEG-D has many variations in the file format, and in Seismic Unix there is only the capability to read a file in through the segdread program, converting the file to the SU format. Seismic Unix only has the capability to write files into the SEG-Y format.

Conversion between SEG-D and SEG-Y
Within Seismic Unix it is possible to read in a file through segdread and then use the procedure for converting SU format to the SEG-Y format. Effectively this will read in SEG-D to SU, and then write out a SEG-Y file. Conversion from SEG-Y to SEG-D is not possible within the confines of Seismic Unix.

Another option for converting SEG-D to SEG-Y is a freeware application found on Seismatters.com called the SEGD to SEGY converter. The author for this utility is not known, so the reliability of the application is not verified (see external link to seismatters.com 3rd party seismic applications page).

Examples
Typical Conversion of SEG-Y/SEG-D to SU Or Conversion in all tape reading situations:



Troubleshooting
In Seismic Unix there are a number of possible scenarios that can make a file not read properly. Seismic files often have slight variations from the standard that are still passed off as "SEG-Y". This can lead to errors when reading in data or writing out.

One such error is leaving traces in the IEEE format. To read in this data it is necessary to use the following command:

Another possible problem is trying to read data in a little-endian format on a big-endian machine. To read that data the following command would be used:

If using a little-endian machine then the 0 would be replaced with a 1 in the "endian" section of the code to prevent conversion to the big-endian format.

Future Trends in Seismic Data Formatting
As seismic acquisition evolves, so do the formatting standards. SEG-Y and SEG-D both have their advantages and disadvantages; however, it seems that SEG-D has become the leader in seismic data acquisition. The SEG works closely with the International Oil and Gas Producers Association who represent 80% of the oil and gas producers around the world. There has been several revisions to the SEG-D format, the most recent being in 2012, there has only been a few revisions to SEG-Y the most recent being 2002. The latest SEG-D revision provides optional headers to make the acquisition process more user friendly, including the ability to record receiver coordinates, reference systems, crew and job information, and capture of processing information. It is possible to record such information in SEG-Y format, but the goal of the newest SEG-D revision was to make this information standardized to create symmetry from job to job. The SEG-D format can also support a greater variety of data formats, including the IEEE 8 byte format in the newest revision. Although the SEG-Y format may be beneficial if the user needs freedom in recording format, the SEG-D format has particularly accommodated to the oil and gas field by updating its format with standardized headers.