ZAbbrev has been ported from C# to C

I’ve ported ZAbbrev by @heasm66 from C# to C, mostly for people such as myself who would be interested in such a thing.

  • Full C implementation (src/*.c, *.h): string hashing, suffix arrays, abbreviation logic.
  • Uses GNU Autotools (configure.ac, Makefile.am): standard build/test/install flow.
  • Docs included: README, man page (doc/zabbrev.1), Texinfo (doc/zabbrev.texi).
  • Tests (tests/*.sh): basic suite, integrated with make check.
  • Being a derivative of the C# version, it continues to be licensed under GPLv3+.
12 Likes

Nice! Admire your courage to tackle my spaghetti-code.

How do the C-version benchmark against the C#? Are there any gains?

Feel free to make improvements if you find any. My main outstanding issue is to find a way to calculate the optimal abbreviations that take the lost space due to padding in account. The whole process of testing with adding and removing leading and trailing spaces or other characters is that sometimes testing a slightly less optimal abbreviation reduces space lost to padding more. As it is now, the process are both imperfect and unsatisfying…

(If anyone wants to extract the algorithm and incorperate it into a compiler, like Inform6 or Dialog. You have my full support and permission.)

1 Like

I appreciate the porting into a much more universal language (I’m not exactly in friendly terms with .net/Mono…)

Best regards from Italy,
dott. Piergiorgio.

2 Likes

I think it is unhelpful to initiate a OS-war, something I notice you often do. Be happy with your own choice and keep critique of others choices to yourself.

I do my programming for my own gratification and not to please you.

C# is a widely used and legitimate language and sometimes listed as more popular than, for example, C - Most used languages among software developers globally 2024| Statista

/end-of-rant

6 Likes

Clearly you are all wrong.

Write it in Rust!

I kid. I kid.

I do mostly Rust and C# but have been known to do C or even …shudder… C++ from time to time.

(But seriously, write it in Rust) :wink:

3 Likes

I’m Rust-curious

2 Likes

I actually set out to learn Go some years ago, and encountered an article comparing Go and Rust. The result was I learned Rust.

It wasn’t that the article was unflattering to Go, just that I became intrigued by the low-level, no garbage collection and yet no manual memory management shenanigans aspect.

Does it concern anyone that #2 on that list is HTML/CSS?

I do all my programming in HTML! :rofl:

3 Likes

The percentages adds to more than 100%, so I guess people use combinations. I personally would rank high with C# and SQL, they obviously fill different needs…

Henrik, your critiqua can be worded differently… also, I noted an edit, so perhaps you have already reworded something, perhaps that is best I don’t have read ? mah.

nyway.

Dunno about rust, a

Best regards from Italy,
dott. Piergiorgio.

I changed a spelling error, “it it” → “it is”. You can click on the pen to see edits. Of course I could have worded it differently, but that doesn’t change the substance.

Would you like to add a page for this on Form:Software - IFWiki?

(Maybe the original and the port should get their own pages, to make sense of the “Version” fields.)

That seems reasonable. Added: ZAbbrev - IFWiki

2 Likes

I tried to compile it on a virtual Ubuntu installation I have. Everything looked fine during compilation

compilation log
henrik@henrik-VirtualBox-Ubuntu:~/zabbrev_c$ autoreconf -fi
henrik@henrik-VirtualBox-Ubuntu:~/zabbrev_c$ ./configure
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a race-free mkdir -p... /usr/bin/mkdir -p
checking for gawk... no
checking for mawk... mawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether the compiler supports GNU C... yes
checking whether gcc accepts -g... yes
checking for gcc option to enable C11 features... none needed
checking whether gcc understands -c and -o together... yes
checking whether make supports the include directive... yes (GNU style)
checking dependency style of gcc... gcc3
checking that generated files are newer than configure... done
configure: creating ./config.status
config.status: creating Makefile
config.status: creating src/Makefile
config.status: creating doc/Makefile
config.status: creating tests/Makefile
config.status: creating config.h
config.status: config.h is unchanged
config.status: executing depfiles commands
henrik@henrik-VirtualBox-Ubuntu:~/zabbrev_c$ make
make  all-recursive
make[1]: Entering directory '/home/henrik/zabbrev_c'
Making all in src
make[2]: Entering directory '/home/henrik/zabbrev_c/src'
gcc -DHAVE_CONFIG_H -I. -I..    -std=c89 -Wall -Wextra -pedantic -Wconversion -Wsign-conversion -Wshadow -Wpointer-arith -Wcast-align -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -Wmissing-declarations -Wuninitialized -Wno-unused-parameter -O2 -g -O2 -MT zabbrev.o -MD -MP -MF .deps/zabbrev.Tpo -c -o zabbrev.o zabbrev.c
mv -f .deps/zabbrev.Tpo .deps/zabbrev.Po
gcc -DHAVE_CONFIG_H -I. -I..    -std=c89 -Wall -Wextra -pedantic -Wconversion -Wsign-conversion -Wshadow -Wpointer-arith -Wcast-align -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -Wmissing-declarations -Wuninitialized -Wno-unused-parameter -O2 -g -O2 -MT strhash.o -MD -MP -MF .deps/strhash.Tpo -c -o strhash.o strhash.c
mv -f .deps/strhash.Tpo .deps/strhash.Po
gcc -DHAVE_CONFIG_H -I. -I..    -std=c89 -Wall -Wextra -pedantic -Wconversion -Wsign-conversion -Wshadow -Wpointer-arith -Wcast-align -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -Wmissing-declarations -Wuninitialized -Wno-unused-parameter -O2 -g -O2 -MT suffix-array.o -MD -MP -MF .deps/suffix-array.Tpo -c -o suffix-array.o suffix-array.c
mv -f .deps/suffix-array.Tpo .deps/suffix-array.Po
gcc -DHAVE_CONFIG_H -I. -I..    -std=c89 -Wall -Wextra -pedantic -Wconversion -Wsign-conversion -Wshadow -Wpointer-arith -Wcast-align -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -Wmissing-declarations -Wuninitialized -Wno-unused-parameter -O2 -g -O2 -MT zabbrev-common.o -MD -MP -MF .deps/zabbrev-common.Tpo -c -o zabbrev-common.o zabbrev-common.c
mv -f .deps/zabbrev-common.Tpo .deps/zabbrev-common.Po
gcc -DHAVE_CONFIG_H -I. -I..    -std=c89 -Wall -Wextra -pedantic -Wconversion -Wsign-conversion -Wshadow -Wpointer-arith -Wcast-align -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -Wmissing-declarations -Wuninitialized -Wno-unused-parameter -O2 -g -O2 -MT xalloc.o -MD -MP -MF .deps/xalloc.Tpo -c -o xalloc.o xalloc.c
mv -f .deps/xalloc.Tpo .deps/xalloc.Po
gcc -std=c89 -Wall -Wextra -pedantic -Wconversion -Wsign-conversion -Wshadow -Wpointer-arith -Wcast-align -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -Wmissing-declarations -Wuninitialized -Wno-unused-parameter -O2 -g -O2   -o zabbrev zabbrev.o strhash.o suffix-array.o zabbrev-common.o xalloc.o  
make[2]: Leaving directory '/home/henrik/zabbrev_c/src'
Making all in doc
make[2]: Entering directory '/home/henrik/zabbrev_c/doc'
Updating ./version.texi
restore=: && backupdir=".am$$" && \
am__cwd=`pwd` && CDPATH="${ZSH_VERSION+.}:" && cd . && \
rm -rf $backupdir && mkdir $backupdir && \
if (/bin/bash '/home/henrik/zabbrev_c/missing' makeinfo --version) >/dev/null 2>&1; then \
  for f in zabbrev.info zabbrev.info-[0-9] zabbrev.info-[0-9][0-9] zabbrev.i[0-9] zabbrev.i[0-9][0-9]; do \
    if test -f $f; then mv $f $backupdir; restore=mv; else :; fi; \
  done; \
else :; fi && \
cd "$am__cwd"; \
if /bin/bash '/home/henrik/zabbrev_c/missing' makeinfo -D VERSION=dev-1-70cba75  -I . \
 -o zabbrev.info zabbrev.texi; \
then \
  rc=0; \
  CDPATH="${ZSH_VERSION+.}:" && cd .; \
else \
  rc=$?; \
  CDPATH="${ZSH_VERSION+.}:" && cd . && \
  $restore $backupdir/* `echo "./zabbrev.info" | sed 's|[^/]*$||'`; \
fi; \
rm -rf $backupdir; exit $rc
make[2]: Leaving directory '/home/henrik/zabbrev_c/doc'
Making all in tests
make[2]: Entering directory '/home/henrik/zabbrev_c/tests'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/home/henrik/zabbrev_c/tests'
make[2]: Entering directory '/home/henrik/zabbrev_c'
make[2]: Leaving directory '/home/henrik/zabbrev_c'
make[1]: Leaving directory '/home/henrik/zabbrev_c'

But when I try it, it produces a bit strange output

Output from zabbrev C port
henrik@henrik-VirtualBox-Ubuntu:~/if/curses$ zabbrev_c -i -x0
zabbrev dev-1-70cba75
Based on the C# version by Henrik Åsman, (c) 2021-2024
Ported to C by Jason Self, (c) 2025
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Reading file...
Done in 0.000 s
Building suffix array...
Done in 0.203 s
Building lcp array...
Done in 0.003 s
Extracting patterns...
Done in 0.058 s
Selecting abbreviations...
100%
Done in 3.269 s
Refining picked abbreviations...
Done in 0.055 s
  1 30015 
 
  2     6 

  3  4065 .
 
  4   342 
 
  5  7749  the 
  6   735  the
  7  1065 
 Th
  8    39 
 T
  9    60 the 
 10   453 
 The
 11   372 
 The 
 12   714  th
 13   120 .
 Th
 14    54 .
 T
 15   390  
 
 16   264 .
 The
 17   879 
 You
 18   384 he 
 19   243 e.
 
 20   276 
 You 
 21   246 the
 22   228 .
 The 
 23   813 

 
 24   270 
 A
 25   438 
 

 26  2385 , 
 27   168 t.
 
 28  2121  and 
 29     3 


 30  2259 ing 
 31  1854  of 
 32   105 
  
 33   192 .
 You
 34   159 .
 You 
 35  1758  you
 36   696  of the 
 37  1623  to 
 38   162 s.
 
 39   597  you 
 40     6 .
 Y
 41   612 "
 
 42   366  The 
 43   225  The
 44   177 and 
 45   252 .

 
 46    24  of the
 47   453 n the 
 48   138 
 "
 49   264 The 
 50     3 "

 51     9 of the 
 52     9 .


 53    75  and
 54    90 
 I
 55   111 e
 
 56   111 y.
 
 57   111 
  
 58   252 .
 

 59   219 .
 A
 60   501 ing
 61    90  Th
 62    45  of th
 63  1380  is 
 64    45 The
 65    24 f the 
 66     3 y.

 67   402  You 
 68   519 , and 
 69    69 
 A 
 70    27 n the
 71   372 "
 "
 72   414 to the 
 73    69 n.
 
 74    57 ."
 
 75   765  "
 76    66 you 
 77   264  an
 78    75 ng 
 79   156 ."
 "
 80   174 .
 I
 81   120  of
 82    63 d.
 
 83   960 


 84    60  You
 85   177  to
 86    30  of t
 87   378  in the 
 88    87 s
 
 89     3 n.

 90   150 nd 
 91   738  a 
 92   123 You 
 93    63 to 
 94   258 and
 95    12 of 
 96     3 f the
Original cost: 317163, after abbreviations: 240612
High memory strings (1383 strings):
   411 strings with  2 empty z-chars, total =      822,   25368 bytes
   455 strings with  1 empty z-chars, total =      455,   24472 bytes
   517 strings with  0 empty z-chars, total =        0,   26110 bytes
              -------
                75950 bytes
Dynamic and static memory strings (2906 strings):
  1032 strings with  2 empty z-chars, total =     2064,   35320 bytes
   926 strings with  1 empty z-chars, total =      926,   34240 bytes
   948 strings with  0 empty z-chars, total =        0,   32756 bytes
              -------
               102316 bytes
     ===============
Total:         4267,  178266 bytes

Storage size for the 96 abbreviations:     602 bytes
Storage size for the strings:          +  178266 bytes
                                      =========
                                        178868 bytes

Total elapsed time: 3.717 s
Output from zabbrev (C#), with same parameters
henrik@henrik-VirtualBox-Ubuntu:~/if/curses$ zabbrev -i -x0
ZAbbrev 0.12 (28th December 2024, in development) by Henrik Åsman, (c) 2021-2024
Highly optimized abbreviations computed efficiently

Processing files in directory: /home/henrik/if/curses
Compression level            : fastest

Progress                               Time (s) Mem (MB)
--------                               -------- --------
Reading file...                           0.067    76.00  256 361 characters, utf-8, Unicode (UTF-8)
Building suffix arrays...                 0.853   131.00  32 858 794 825 potential patterns
Extracting viable patterns...             0.360   143.00  4 289 strings, 302 442 patterns extracted
Build max heap with naive score...        0.059   165.00  266 261 patterns added to heap
Rescoring with optimal parse...    100%   5.670   216.00  Total saving 59 440 z-chars, text size = 147 846 bytes

Total elapsed time:   7.056 s

Abbreviations would save 59 440 z-chars total (~39 627 bytes)

High memory strings (1 383 strings):
   197 strings with  5 empty z-chars, total =      985,  10 604 bytes
   258 strings with  4 empty z-chars, total =    1 032,  10 668 bytes
   259 strings with  3 empty z-chars, total =      777,  12 648 bytes
   243 strings with  2 empty z-chars, total =      486,  11 536 bytes
   203 strings with  1 empty z-chars, total =      203,   9 636 bytes
   223 strings with  0 empty z-chars, total =        0,   9 896 bytes
                                                        -------
                                                         64 988 bytes
Dynamic and static memory strings (2 906 strings):
   903 strings with  2 empty z-chars, total =    1 806,  27 116 bytes
 1 073 strings with  1 empty z-chars, total =    1 073,  28 834 bytes
   930 strings with  0 empty z-chars, total =        0,  26 574 bytes
                                                        -------
                                                         82 524 bytes
                                                ===============
Total:                                           6 362, 147 512 bytes

Storage size for the 96 abbreviations:                      334 bytes
Storage size for the strings:                         + 147 512 bytes
                                                      =========
                                                        147 846 bytes

Abbreviate " through";                      !    86x 8, saved   507
Abbreviate " little ";                      !    53x 8, saved   309
Abbreviate " of the ";                      !   349x 8, saved  2085
Abbreviate " in the ";                      !   207x 8, saved  1233
Abbreviate " on the ";                      !   116x 8, saved   687
Abbreviate " Meldrew";                      !    41x 9, saved   278
Abbreviate " which ";                       !   144x 7, saved   711
Abbreviate "to the ";                       !   279x 7, saved  1386
Abbreviate " your ";                        !   209x 6, saved   830
Abbreviate " is a ";                        !   143x 6, saved   566
Abbreviate "Austin";                        !    52x 7, saved   251
Abbreviate ", and ";                        !   287x 7, saved  1426
Abbreviate ", but ";                        !   168x 7, saved   831
Abbreviate " with ";                        !   264x 6, saved  1050
Abbreviate ", the ";                        !   116x 7, saved   571
Abbreviate " about";                        !   110x 6, saved   434
Abbreviate "north";                         !   102x 5, saved   300
Abbreviate " from";                         !   193x 5, saved   573
Abbreviate " are ";                         !   222x 5, saved   660
Abbreviate " some";                         !   202x 5, saved   600
Abbreviate " you ";                         !   534x 5, saved  1596
Abbreviate "have ";                         !   164x 5, saved   486
Abbreviate " that";                         !   201x 5, saved   597
Abbreviate "thing";                         !   272x 5, saved   810
Abbreviate " down";                         !   136x 5, saved   402
Abbreviate "ould ";                         !   103x 5, saved   303
Abbreviate " and ";                         !   477x 5, saved  1425
Abbreviate " the ";                         !  1366x 5, saved  4092
Abbreviate "round";                         !    94x 5, saved   276
Abbreviate " seem";                         !   113x 5, saved   333
Abbreviate ". The";                         !   151x 7, saved   746
Abbreviate "east";                          !   133x 4, saved   260
Abbreviate "his ";                          !   254x 4, saved   502
Abbreviate "one ";                          !   167x 4, saved   328
Abbreviate "n't ";                          !   226x 5, saved   672
Abbreviate " in ";                          !   311x 4, saved   616
Abbreviate "ight";                          !   386x 4, saved   766
Abbreviate "here";                          !   399x 4, saved   792
Abbreviate "look";                          !   139x 4, saved   272
Abbreviate "    ";                          !   186x 4, saved   366
Abbreviate "tion";                          !   241x 4, saved   476
Abbreviate " it ";                          !   209x 4, saved   412
Abbreviate " for";                          !   308x 4, saved   610
Abbreviate "side";                          !   173x 4, saved   340
Abbreviate "You ";                          !   401x 5, saved  1197
Abbreviate " is ";                          !   526x 4, saved  1046
Abbreviate " to ";                          !   743x 4, saved  1480
Abbreviate " of ";                          !   755x 4, saved  1504
Abbreviate "The ";                          !   504x 5, saved  1506
Abbreviate " you";                          !   249x 4, saved   492
Abbreviate "ing ";                          !   728x 4, saved  1450
Abbreviate "able";                          !   152x 4, saved   298
Abbreviate "ough";                          !   137x 4, saved   268
Abbreviate " co";                           !   338x 3, saved   335
Abbreviate "er ";                           !   502x 3, saved   499
Abbreviate "...";                           !   155x 6, saved   614
Abbreviate " a ";                           !   619x 3, saved   616
Abbreviate " th";                           !   185x 3, saved   182
Abbreviate " st";                           !   300x 3, saved   297
Abbreviate "'s ";                           !   336x 4, saved   666
Abbreviate "ed ";                           !   595x 3, saved   592
Abbreviate "ly ";                           !   617x 3, saved   614
Abbreviate " in";                           !   412x 3, saved   409
Abbreviate "ill";                           !   223x 3, saved   220
Abbreviate "est";                           !   226x 3, saved   223
Abbreviate "way";                           !   250x 3, saved   247
Abbreviate " on";                           !   392x 3, saved   389
Abbreviate "all";                           !   544x 3, saved   541
Abbreviate "and";                           !   442x 3, saved   439
Abbreviate "ous";                           !   268x 3, saved   265
Abbreviate "ear";                           !   300x 3, saved   297
Abbreviate "ain";                           !   295x 3, saved   292
Abbreviate "ver";                           !   304x 3, saved   301
Abbreviate "en ";                           !   338x 3, saved   335
Abbreviate "es ";                           !   302x 3, saved   299
Abbreviate "rea";                           !   308x 3, saved   305
Abbreviate "ter";                           !   358x 3, saved   355
Abbreviate "as ";                           !   263x 3, saved   260
Abbreviate "s, ";                           !   272x 4, saved   538
Abbreviate "ing";                           !   601x 3, saved   598
Abbreviate "st ";                           !   319x 3, saved   316
Abbreviate "ard";                           !   268x 3, saved   265
Abbreviate "re ";                           !   260x 3, saved   257
Abbreviate "an ";                           !   398x 3, saved   395
Abbreviate " be";                           !   430x 3, saved   427
Abbreviate "out";                           !   374x 3, saved   371
Abbreviate "ent";                           !   475x 3, saved   472
Abbreviate "e, ";                           !   261x 4, saved   516
Abbreviate "the";                           !   527x 3, saved   524
Abbreviate "al ";                           !   248x 3, saved   245
Abbreviate "hat";                           !   214x 3, saved   211
Abbreviate "s.";                            !   226x 3, saved   223
Abbreviate ". ";                            !   782x 3, saved   779
Abbreviate ".^";                            !   212x 4, saved   418
Abbreviate "e.";                            !   289x 3, saved   286
Abbreviate ", ";                            !  1003x 3, saved  1000

gametext.txt (296.5 KB)

What am I doing wrong?

Thanks for reporting this; it’s now fixed in version dev-2-a10f367. Please let me know if you encounter any further issues.

2 Likes

There something strange about what abbreviations that are found (same gametext.txt as above). Comparing the C# and C version with same parameters should theoretically give the same result, right?

Using: zabbrev -x0 -i

C#
Abbreviate " through";                      !    86x 8, saved   507
Abbreviate " little ";                      !    53x 8, saved   309
Abbreviate " of the ";                      !   349x 8, saved  2085
Abbreviate " in the ";                      !   207x 8, saved  1233
Abbreviate " on the ";                      !   116x 8, saved   687
Abbreviate " Meldrew";                      !    41x 9, saved   278
Abbreviate " which ";                       !   144x 7, saved   711
Abbreviate "to the ";                       !   279x 7, saved  1386
Abbreviate " your ";                        !   209x 6, saved   830
Abbreviate " is a ";                        !   143x 6, saved   566
Abbreviate "Austin";                        !    52x 7, saved   251
Abbreviate ", and ";                        !   287x 7, saved  1426
Abbreviate ", but ";                        !   168x 7, saved   831
Abbreviate " with ";                        !   264x 6, saved  1050
Abbreviate ", the ";                        !   116x 7, saved   571
Abbreviate " about";                        !   110x 6, saved   434
Abbreviate "north";                         !   102x 5, saved   300
Abbreviate " from";                         !   193x 5, saved   573
Abbreviate " are ";                         !   222x 5, saved   660
Abbreviate " some";                         !   202x 5, saved   600
Abbreviate " you ";                         !   534x 5, saved  1596
Abbreviate "have ";                         !   164x 5, saved   486
Abbreviate " that";                         !   201x 5, saved   597
Abbreviate "thing";                         !   272x 5, saved   810
Abbreviate " down";                         !   136x 5, saved   402
Abbreviate "ould ";                         !   103x 5, saved   303
Abbreviate " and ";                         !   477x 5, saved  1425
Abbreviate " the ";                         !  1366x 5, saved  4092
Abbreviate "round";                         !    94x 5, saved   276
Abbreviate " seem";                         !   113x 5, saved   333
Abbreviate ". The";                         !   151x 7, saved   746
Abbreviate "east";                          !   133x 4, saved   260
Abbreviate "his ";                          !   254x 4, saved   502
Abbreviate "one ";                          !   167x 4, saved   328
Abbreviate "n't ";                          !   226x 5, saved   672
Abbreviate " in ";                          !   311x 4, saved   616
Abbreviate "ight";                          !   386x 4, saved   766
Abbreviate "here";                          !   399x 4, saved   792
Abbreviate "look";                          !   139x 4, saved   272
Abbreviate "    ";                          !   186x 4, saved   366
Abbreviate "tion";                          !   241x 4, saved   476
Abbreviate " it ";                          !   209x 4, saved   412
Abbreviate " for";                          !   308x 4, saved   610
Abbreviate "side";                          !   173x 4, saved   340
Abbreviate "You ";                          !   401x 5, saved  1197
Abbreviate " is ";                          !   526x 4, saved  1046
Abbreviate " to ";                          !   743x 4, saved  1480
Abbreviate " of ";                          !   755x 4, saved  1504
Abbreviate "The ";                          !   504x 5, saved  1506
Abbreviate " you";                          !   249x 4, saved   492
Abbreviate "ing ";                          !   728x 4, saved  1450
Abbreviate "able";                          !   152x 4, saved   298
Abbreviate "ough";                          !   137x 4, saved   268
Abbreviate " co";                           !   338x 3, saved   335
Abbreviate "er ";                           !   502x 3, saved   499
Abbreviate "...";                           !   155x 6, saved   614
Abbreviate " a ";                           !   619x 3, saved   616
Abbreviate " th";                           !   185x 3, saved   182
Abbreviate " st";                           !   300x 3, saved   297
Abbreviate "'s ";                           !   336x 4, saved   666
Abbreviate "ed ";                           !   595x 3, saved   592
Abbreviate "ly ";                           !   617x 3, saved   614
Abbreviate " in";                           !   412x 3, saved   409
Abbreviate "ill";                           !   223x 3, saved   220
Abbreviate "est";                           !   226x 3, saved   223
Abbreviate "way";                           !   250x 3, saved   247
Abbreviate " on";                           !   392x 3, saved   389
Abbreviate "all";                           !   544x 3, saved   541
Abbreviate "and";                           !   442x 3, saved   439
Abbreviate "ous";                           !   268x 3, saved   265
Abbreviate "ear";                           !   300x 3, saved   297
Abbreviate "ain";                           !   295x 3, saved   292
Abbreviate "ver";                           !   304x 3, saved   301
Abbreviate "en ";                           !   338x 3, saved   335
Abbreviate "es ";                           !   302x 3, saved   299
Abbreviate "rea";                           !   308x 3, saved   305
Abbreviate "ter";                           !   358x 3, saved   355
Abbreviate "as ";                           !   263x 3, saved   260
Abbreviate "s, ";                           !   272x 4, saved   538
Abbreviate "ing";                           !   601x 3, saved   598
Abbreviate "st ";                           !   319x 3, saved   316
Abbreviate "ard";                           !   268x 3, saved   265
Abbreviate "re ";                           !   260x 3, saved   257
Abbreviate "an ";                           !   398x 3, saved   395
Abbreviate " be";                           !   430x 3, saved   427
Abbreviate "out";                           !   374x 3, saved   371
Abbreviate "ent";                           !   475x 3, saved   472
Abbreviate "e, ";                           !   261x 4, saved   516
Abbreviate "the";                           !   527x 3, saved   524
Abbreviate "al ";                           !   248x 3, saved   245
Abbreviate "hat";                           !   214x 3, saved   211
Abbreviate "s.";                            !   226x 3, saved   223
Abbreviate ". ";                            !   782x 3, saved   779
Abbreviate ".^";                            !   212x 4, saved   418
Abbreviate "e.";                            !   289x 3, saved   286
Abbreviate ", ";                            !  1003x 3, saved  1000
C
Abbreviate "^ ";                            !  4402x 5, saved 13206
Abbreviate ".^ ";                           !  2034x 7, saved  4068
Abbreviate ".^";                            !  2296x 6, saved  1047
Abbreviate " the ";                         !  2616x 5, saved  7746
Abbreviate " the";                          !  3002x 4, saved   732
Abbreviate "the ";                          !  2622x 4, saved    63
Abbreviate " th";                           !  3749x 3, saved   717
Abbreviate "he ";                           !  3382x 3, saved   753
Abbreviate "^ Th";                          !   558x 8, saved   696
Abbreviate "the";                           !  3244x 3, saved   249
Abbreviate "^ The";                         !   453x 9, saved   453
Abbreviate "^ T";                           !   605x 7, saved    42
Abbreviate "^ The ";                        !   371x10, saved   369
Abbreviate "^^";                            !   469x 8, saved   864
Abbreviate ", ";                            !  2641x 3, saved  2403
Abbreviate " and ";                         !   857x 5, saved  2127
Abbreviate "ing ";                          !  1223x 4, saved  2283
Abbreviate ".^ Th";                         !   305x10, saved   120
Abbreviate ".^ The";                        !   266x11, saved   264
Abbreviate " of ";                          !  1184x 4, saved  1869
Abbreviate ".^ T";                          !   332x 9, saved    54
Abbreviate ".^ The ";                       !   226x12, saved   228
Abbreviate "^ You ";                        !   277x10, saved  1047
Abbreviate "^ You";                         !   310x 9, saved   105
Abbreviate " ^";                            !   707x 5, saved   420
Abbreviate " you";                          !  1060x 4, saved  1758
Abbreviate " of the ";                      !   352x 8, saved   696
Abbreviate ".^^";                           !   262x10, saved   363
Abbreviate "e.^ ";                          !   339x 8, saved   228
Abbreviate " to ";                          !  1008x 4, saved  1638
Abbreviate " you ";                         !   662x 5, saved   600
Abbreviate "^^ ";                           !   281x 9, saved    66
Abbreviate " The ";                         !   480x 6, saved   336
Abbreviate " The";                          !   635x 5, saved   216
Abbreviate "and ";                          !   951x 4, saved   177
Abbreviate " ^ ";                           !   462x 6, saved   291
Abbreviate "e.^";                           !   369x 7, saved    27
Abbreviate " of the";                       !   365x 7, saved    24
Abbreviate "n the ";                        !   451x 6, saved   453
Abbreviate "The ";                          !   595x 5, saved   228
Abbreviate "of the ";                       !   352x 7, saved     9
Abbreviate " and";                          !   861x 4, saved    75
Abbreviate "^ ^";                           !   245x 9, saved   171
Abbreviate ".^ You ";                       !   163x12, saved   327
Abbreviate "^  ";                           !   405x 6, saved   192
Abbreviate "ing";                           !  1601x 3, saved   501
Abbreviate ".^ You";                        !   176x11, saved    27
Abbreviate " Th";                           !   780x 4, saved    75
Abbreviate "t.^ ";                          !   260x 8, saved   168
Abbreviate " of th";                        !   388x 6, saved    45
Abbreviate " is ";                          !   770x 4, saved  1383
Abbreviate "The";                           !   770x 4, saved    42
Abbreviate "^ A";                           !   304x 7, saved   294
Abbreviate "f the ";                        !   378x 6, saved    24
Abbreviate " You ";                         !   367x 6, saved   360
Abbreviate ", and ";                        !   294x 7, saved   519
Abbreviate "n the";                         !   478x 5, saved    27
Abbreviate "t.^";                           !   285x 7, saved    21
Abbreviate "to the ";                       !   279x 7, saved   411
Abbreviate " "";                            !   458x 5, saved   879
Abbreviate "you ";                          !   663x 4, saved    66
Abbreviate " an";                           !  1318x 3, saved   267
Abbreviate "ng ";                           !  1316x 3, saved    72
Abbreviate ".^^ ";                          !   147x11, saved    57
Abbreviate " of";                           !  1294x 3, saved   120
Abbreviate " You";                          !   420x 5, saved    60
Abbreviate " to";                           !  1257x 3, saved   177
Abbreviate " of t";                         !   418x 5, saved    30
Abbreviate " in the ";                      !   209x 8, saved   378
Abbreviate ".^ Y";                          !   178x 9, saved     3
Abbreviate ""^ ";                           !   178x 9, saved   606
Abbreviate "nd ";                           !  1239x 3, saved   147
Abbreviate " a ";                           !  1228x 3, saved   738
Abbreviate "You ";                          !   408x 5, saved   123
Abbreviate "to ";                           !  1211x 3, saved    60
Abbreviate "and";                           !  1206x 3, saved   261
Abbreviate "s.^ ";                          !   201x 8, saved   156
Abbreviate "of ";                           !  1195x 3, saved    12
Abbreviate "f the";                         !   395x 5, saved     3
Abbreviate ", and";                         !   295x 6, saved    15
Abbreviate "s.^";                           !   236x 7, saved    30
Abbreviate "o the ";                        !   294x 6, saved    15
Abbreviate " in ";                          !   588x 4, saved   546
Abbreviate "       ";                       !   235x 7, saved   276
Abbreviate "      ";                        !   291x 6, saved    60
Abbreviate "     ";                         !   382x 5, saved    24
Abbreviate "^ "";                           !   164x 9, saved    93
Abbreviate "        ";                      !   191x 8, saved    21
Abbreviate ""^";                            !   191x 8, saved   213
Abbreviate "to the";                        !   284x 6, saved     6
Abbreviate " the s";                        !   283x 6, saved   147
Abbreviate ".^ ^";                          !   125x11, saved   204
Abbreviate "is ";                           !  1110x 3, saved   282
Abbreviate "in the ";                       !   222x 7, saved    21
Abbreviate " with ";                        !   272x 6, saved   882
Abbreviate "n th";                          !   544x 4, saved    39

especially that it finds, for example, all of these useful:

Abbreviate " the ";                         !  2616x 5, saved  7746
Abbreviate " the";                          !  3002x 4, saved   732
Abbreviate "the ";                          !  2622x 4, saved    63

The calculation of savings doesn’t seems entirely correct.

Also, if I try to run it without the -x0 it gets stuck in an infinit loop, and never exits. This problem can be related to the above.

The C port doesn’t mimic the original behavior.

The C# version keeps the ^ character in text lines and joins lines using a vertical tab (\v). This prevents patterns from crossing line boundaries. The C port replaces ^ with newline (\n) and concatenates lines with newline as well. This allows patterns to span lines, producing abbreviations like “^ " or " the” that never appear in the C# results. Should cross-line abbreviations be eliminated? That should yield results similar to the C# version.

Because newline-separated patterns occur very frequently, the C version reports different savings. It also leads to a greatly expanded search space, so running with default compression (-x2) can require many rescore passes. The “infinite loop” likely results from the replace_with_heap phase running for up to NUMBER_OF_PASSES_DEFAULT (10000) iterations when the compression level is ≥1; the loop only stops when the pass reaches that value or the heap is empty. With many patterns, this step can run very long, so level -x0 (which skips it) returns quickly.

Ok, I see.

I think cross-line abbreviations should be eliminated, IF cross-line is from different game text lines. An abbreviation can’t be applied over string boundaries.

Consider these strings ({text} = applied abbreviation):

39A88 S2264 "-- Dr Johnson (a let{ter}{ from }1775)"
39AA4 S2265 "Expatiate free o'{er }{all} t{his }scene{ of }man;"
39AC0 S2266 "A m{ight}y maze! but not with{out}{ a }plan."

Between S2265 and S2266 you could find an abbreviation like "^A ", but that abbreviation can’t be used here and the calculations go wrong.

You could maybe use some other pattern between strings that indicate an unusable abbreviation?

1 Like

I’ve made some updates to the C implementation with version dev-4-5d91346. However, there may still be some variation in the abbreviation lists because abbreviations generated by the C# implementation are not deterministic. Two examples of non-determinism are:

  • In terms of pattern extraction and candidate ordering, the C# implementation stores candidates in a Dictionary<string, PatternData> and iterates over it when building the heap. The enumeration order of a C# Dictionary is unspecified so the heap may receive candidates in a different sequence. This can affect which patterns are selected and, therefore, the resulting abbreviation list. The C version avoids this by using a stable sorting algorithm on its list of candidates before the heap-based selection process.

  • The C# implementation sorts abbreviations with list.Sort and then reverses the list. This is not a stable sort and can reorder abbreviations that have the same length. The C version performs a stable insertion sort.

There’s also some non-determinism in the C version, and I plan to address it in the future, along with more testing.

Minor differences in how each language handles aspects such as character encoding, string manipulation, and floating-point arithmetic can also contribute to variations in the calculated scores.

All in all, they’re based on the same algorithm, and even if the abbreviation lists are not identical, each should be of comparable quality and effectiveness. There are still additional tasks to be completed in the C version, and I plan to address them later. I also remain open to bug reports.

Looks nice!

Sorting the abbreviations by length is only a relic from when the compilers didn’t use the optimal parse-algorithm when applying the abbreviations, and shouldn’t matter no more. Before optimal parse the abbreviations was applied in string in the order they where listed and from the front of the string. This had the result that an already applied abbreviation could ruin the use of a later abbreviation, not optimal. By sorting them by length (longer = higher saving) we ensured that the more valuable abbreviation got used. In other words, the resulting file got smaller when the list was sorted. Optimal parse does this much better (optimal) and the order of the abbreviations should no longer matter. Optimal parse was introduced in Inform 6.36, but in older versions the sorting is still relevant.