Skip to content

Conversation

@thawk
Copy link

@thawk thawk commented Feb 3, 2023

Because the first 3 tags in FIX message is 8/9/35, we can assume the same seperator is used in whole message, so we can detect it from these 3 tags.
Because the value of 9= is always numbers, so we can use the string from the first non-number charater after 9= to 35= as seperator.

@thawk thawk changed the title allow multiple char seperator auto detect char seperator Feb 3, 2023
@thawk thawk changed the title auto detect char seperator auto detect seperator Feb 3, 2023
Using the string between field 8= and 35= as seperator.
Allow multiple char seperators.
@thawk thawk force-pushed the auto_detect_delimiter branch from c6fd624 to d4e6cc8 Compare February 3, 2023 07:14
@drewnoakes
Copy link
Owner

Thanks for the PR! Can you share some data to test this change with, that shows a case where it improves parsing?

@thawk
Copy link
Author

thawk commented Feb 24, 2023

Because different software products different foormat of FIX log, the <SOH> will be replaced by different strings to be seen, even nul (0x01) will be used to replace soh for some reason I don't know :-(

Following is several types of log we have met:

8=FIX.4.2<SOH>9=130<SOH>35=AE<SOH>49=LSEHub<SOH>56=LSETR<SOH>115=BROKERX<SOH>34=2287<SOH>43=N<SOH>52=20120330-12:14:09<SOH>370=20120330-12:14:09.816<SOH>571=00008661533TRLO1-1-1-0<SOH>150=H<SOH>10=074<SOH>
8=FIX.4.2[SOH]9=130[SOH]35=AE[SOH]49=LSEHub[SOH]56=LSETR[SOH]115=BROKERX[SOH]34=2287[SOH]43=N[SOH]52=20120330-12:14:09[SOH]370=20120330-12:14:09.816[SOH]571=00008661533TRLO1-1-1-0[SOH]150=H[SOH]10=074[SOH]
8=FIX.4.2;9=130;35=AE;49=LSEHub;56=LSETR;115=BROKERX;34=2287;43=N;52=20120330-12:14:09;370=20120330-12:14:09.816;571=00008661533TRLO1-1-1-0;150=H;10=074;

@drewnoakes
Copy link
Owner

Because the first 3 tags in FIX message is 8/9/35

Are we sure about that? I don't work much with FIX at the moment. My recollection of FIX is that there are very few standard behaviours in the wild. I'm all for auto-detecting the separator, but I am concerned about doing so based on an assumption that might not be universally true.

@whatthefrog
Copy link
Collaborator

whatthefrog commented Feb 24, 2023 via email

@thawk
Copy link
Author

thawk commented Mar 1, 2023

Because the first 3 tags in FIX message is 8/9/35

Are we sure about that? I don't work much with FIX at the moment. My recollection of FIX is that there are very few standard behaviours in the wild. I'm all for auto-detecting the separator, but I am concerned about doing so based on an assumption that might not be universally true.

In page 20 of the FINANCIAL INFORMATION EXCHANGE PROTOCOL (FIX) Version 5.0 Service Pack2, Volume1, section FIX "Tag=Value" SYNTAX. Under Message Format, rule 2 says:

The first three fields in the standard header are Begin String (tag #8) followed by BodyLength (tag #9) followed by MsgType (tag #35).

So, if we use the standard header, it should works. If not, this algorithm will fall back to one of the seperators (/\||;|\x001|\[SOH\]|<SOH>|\^A/), it extends the list of supported sperators with three multiple charaters seperators [SOH]/<SOH>/^A, which are encountered in my work experience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants