I have long been frustrated by BBEdit's refusal to edit files that, for good reason, have different line end characters throughout - like BBEdit Worksheet files for instance.
Now I have another similar problem. Adobe has changed the format of its .fdf, form data, files. For almost a decade I have been filling out US tax forms by creating fdf files with Excel macros. I could download the PDF blanks from the USIRS web site and Acrobat would politely load the data from the fdf files.
The new format mixes not line ends but UTF16 and ASCII encoding in the same file! Needless to say, BBEdit doesn't handle it well.
Here's what the first few lines looks like when opened with BBEdit
%FDF-1.2 %’“¦" 1 0 obj << /FDF << /Fields [ << /V (œ l i n e 1 4) /T (œ f 1 _ 0 5 8 \( 0 \)) >> << /V (œ l i n e 1 5) /T (œ f 1 _ 0 6 0 \( 0 \)) >> << /V (œ l i n e 6) /T (œ f 1 _ 0 4 2 \( 0 \))
Here's the hexdump of the same first few lines produced by BBEdit
It appears that the parentheses that are not escaped designate blocks that are encoded as UTF16. They begin with an FEFF code point which is surely a byte order mark. After that there are 16 bit entries the first byte of which is a null for every file I have looked at. The escaped parentheses are there because the author of the PDF used parentheses in his definitions of the form names. Note though that the backslash escape character is preceded by a null but the parenthesis following it is not.
So my question is. . . Is there any way I can make use of BBEdit to post process the plain ASCII files produced by my Excel macros and create a version with the mixed ASCII and UTF16? An AppleScript would be an easy way to go and I care not a whit about speed. Can I tell BBEdit to change from U16 to ASCII and back as it writes a file? BBEdit uses U16 internally for everything. When it reads my fdf file does it convert 0066 ( an f ) to 0000 0066? or does it leave the 16 bits alone by effectively ignoring the null character in the file?
I have looked at reworking my VBA code and that will be a pain. There seems to be no way to handle nulls inside of a worksheet cell. Perl will probably handle the task and I have started that but perl's "use unicode" options are not helpful. There is also UNIX sed which might work with a bunch of successive substitutions. Any other ideas? This is a once a year project and I really don't want to use C for it.
-- -> Stocks are getting pelloreid <-
--~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "BBEdit Talk" group. To post to this group, send email to bbedit@googlegroups.com To unsubscribe from this group, send email to bbedit+unsubscribe@googlegroups.com For more options, visit this group at http://groups.google.com/group/bbedit?hl=en If you have a specific feature request or would like to report a suspected (or confirmed) problem with the software, please email to "support@barebones.com" rather than posting to the group. -~----------~----~----~----~------~----~------~--~-
Mar 21
Maarten Sneep Re: Mixed unicode and ASCII
Mar 21, 2009; 14:27
Maarten Sneep
Re: Mixed unicode and ASCII
Mar 22
Walter Ian Kaye Re: Mixed unicode and ASCII
Mar 22, 2009; 19:56
Walter Ian Kaye
Re: Mixed unicode and ASCII
Search
Lasso Programming
This site manages and broadcasts several email lists pertaining to Lasso Programming and technologies related and used by Lasso developers. Sign up today!