Doc 1: Part 3 … Back to Romania!

Edit 1st May 2018:

  • The last binary section contains a timestamp giving GMT+3
  • Since I wrote this I’ve realised that this *can* be faked by either altering the computer clock on boot, using a virtual machine with an altered timezone, or (in Linux anyway) typing “TZ=utc+3” before a script command.
  • To me it seems likely that this was the reason *why* G2 went to all the trouble of altering the documents in this way.
  • Hot off the presses, The Forensicator has an awesome breakdown of the steps required to change the document in this way.

The last binary section is common to all the altered .doc files. Thus it’s the only section we can be sure of that is created by Guccifer2.0’s computer. For example, here’s 1.doc:

1doc2018-02-15 21-03-05

and here’s 2.doc:

2doc2018-02-15 21-03-41

Identical. Even though the authors of the two documents are different, the files are different, the datastore is common not just among docs 1 & 2, but among all the numbered documents. The only common thing is: Guccifer2.0.

This is the only verifiable breadcrumb from Guccifer2.0. in the documents.

Datastores like this are created by WORD in order to help the user by preserving information, and they are essentially a mini WORD document. We can see the magic characters d0cf11e0a1b11ae1 that are the starting hex common to all MS WORD .docs.  They are difficult to fake, as one missed, or badly calculated byte can result in the whole document corrupting or not opening. They are a lot of effort to fake. And in this case, there seems little reason to fake them. In my opinion it was created automatically, possibly without G2.0. even knowing it was.

Copying to a text file, and converting to binary allows the reading of it in a hex-viewer. The first few lines are:

samxml2018-02-15 21-16-21

Here we see that the datastore was created via MSXML2 SAXXMLReader 5.0, and next we have the characteristic “magic” for a MSWORD document, beginning “D0C”. (How the long winter evenings must have flown «Chez Famille Gates» when they managed to make the first hex for a .doc spell “D0C”).

From D0C onwards we essentially have a mini WORD document. The same rules for file offsets, little-endianness, etc apply as for a regular full sized WORD document. So here we have “03 00 FE FF 09” which indicates that we’re on windows major version 3, and 09 indicates that the sector size is 512bytes. Read more details on Microsofts’ (for once) excellent blog.

Screenshot from 2018-02-16 08-43-59

Working out the file offsets and directory locations is somewhat academic for such a short document – they are easy to find with mark-one eyeball. For example the root “directory” is just a few lines away:

root 2018-02-16 08-59-04

Next up we have a UUID:

Screenshot fromuuid2018-02-16 09-01-49

Which once the pattern for hex UUID to human readable UUID is worked out (took me an embarrassing amount of time) works out as the UUID for MSXML2 SAXXMLReader 5.0

88d969ec-8b8b-4c3d-859e-af6cd158be0f

This means that G2.0. has MSXML SDK installed on his computer

Whether or not it’s a legitimate copy we don’t know; he’s a criminal after all. But if it is legitimate somewhere in the depths of the Microsoft Licensing Database there is an agreement for MSXML SDK with that UUID in Guccifer’s name….

Next we come to a  – dun dun duuurrr – a timestamp. And the words MSO Datastore:

Screenshotmsodatastore 2018-02-16 09-13-19

The timestamp bit is in Win32FileTime format which counts the number of ” 100-nanosecond intervals since January 1, 1601 (UTC)”. It’s not exactly UTC because it doesn’t count leap-seconds, but yeah it pretty much is.

So:

  • We know that this time will be in UTC.
  • Therefore if the rtf time is different (provided of course it isn’t fake, and that it’s local time) then a timezone can be calculated.

The FileTime appears as:

C0 C2 C9 45 F6 C6 D1 01

And also appears four more time (5 total) in the datastore. All exactly the same. So as this time is good to 100 nanosecond resolution we can say this was created automatically by G2.0.’s computer, or he has lighting fast fingers.  It’s highly unlikely to be fake in my opinion. It’s possible of course, but requires orders of magnitude more effort to fake – and for what purpose? The time is consistent with the rtf time (although the hour is different),  and consistent with the upload day to the blog.  There’s no reason to fake it.

Next we have the time repeated twice (create, modify) then the first MSODatastore folder, with a name that took me forever to find out what it was:

msofolderfrom 2018-02-16 10-07-02

I can’t admit to how many hours I spent on trying to work out “47 00 51 00 …” It’s highly embarrassing and more than a little sad. It was clear that it was Base64 encoded, or some sort of encoding as we have the “==” padding at the end. I really tried every rare and exotic encoding from to every Base, and into/out-of every known language encoding looking for what it was. Was it a name? A file destination? The problem was it wasn’t documented anywhere how this folder name was derived. All the MS docs said was that it had to be “unique”.

Eventually I admitted my ignorance and cried for help on the internets and a programming God named Dirk steered me right. It was “simply” a non-standard Base-64 table that MS uses. Trust them to do it a bizarre way. Which decodes the string to a UUID for the custom XML storage item, which we’ll see below.

customXML from 2018-02-16 10-19-14

The whole of the datastore was created as somewhere to put this bibliography schema reference! It’s a remnant of the original custom XML from the Podesta docx (see part one) which was:

<b:Sources SelectedStyle="\APA.XSL" StyleName="APA" xmlns:b="http://schemas.openxmlformats.org/officeDocument/2006/bibliography" xmlns="http://schemas.openxmlformats.org/officeDocument/2006/bibliography"></b:Sources>

and

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<ds:datastoreItem ds:itemID="{7394DE7E-FAED-42D7-8057-7F55D3010E52}" xmlns:ds="http://schemas.openxmlformats.org/officeDocument/2006/customXml"><ds:schemaRefs><ds:schemaRef ds:uri="http://schemas.openxmlformats.org/officeDocument/2006/bibliography"/></ds:schemaRefs></ds:datastoreItem>

Guccifer Sleuths may compare the UUID of G2.0.’s version: 8DC20819-F3E2-4B41-AB13-0098F82D255A with that of the original: 7394DE7E-FAED-42D7-8057-7F55D3010E52 and note the difference. I’m not sure it has any relevance to anything. I think it’s just a UUID that’s generated on the fly as, just what it says, a: “universally unique ID”.

It’s this that the mystery string encodes in it’s non-standard Base64, causing hours of heartache to yours truly.

(Talking Bibliographies; I’d be grateful for someone else to check if there is a listed bibiolography in both the podesta docx and 1.doc. I get one but it’s one regarding LibreOffice books, which I run so it may be just on my machine).

After that there’s nothing else notable that I can find. Sadly no username..!

The Timestamp:

From the MS docs we know that the C0 C2 C9 45 F6 C6 D1 01 is in Little-Endian
Low/High Time notation. Thus it becomes C0 C2 C9 45 : F6 C6 D1 01 as LowTime : HighTime, and when we convert to Little-Endian we get:

C0 C2 C9 45 : F6 C6 D1 01 = 45 C9 C2 C0 : 01 D1 C6 F6

I’m no programmer (but found folks that are) so I’d appreciate a double check. In Python using that stackoverflow thread I get a sane answer with:

import struct
ft = "45C9C2C0:01D1C6F6"
h2, h1 = [int(h, base=16) for h in ft.split(’:’)]
ft dec = struct.unpack(’>Q’, struct.pack(’>LL’, h1, h2))[0]
print ft dec

Which should result in the number of 100-nanosecond intervals since January 1, 1601 UTC:

131104625205560000

python2018-02-16 12-25-10

Then we can convert this to a UTC FileTime. Two methods, just to be sure:

Firstly via Reliablybroken.com’s python tool:

print filetime to dt(ft dec)

Result (… drumroll please Maestro .. ):

2016-06-15 11:08:40.556000 UTC

Then via epochconverter.com on the web, result (.. symbol please Maestro! …):

epochweb 2018-02-16 11-51-17

11:08:41 GMT

Again 11:08 GMT! But the metadata in the .rtf says 14:08:

{\creatim\yr2016\mo6\dy15\hr13\min38}

{\revtim\yr2016\mo6\dy15\hr14\min8}

{\printim\yr2016\mo6\dy15\hr13\min45}

14:08 – 11:08 GMT = GMT + 3

This all depends a bit on the unknown: can we trust the times in the .rtf file? I err on the side that we can for the reasons I’ve already outlined, viz:

  1. Why fake them?
  2. The document needed to have a filetime consistent with the publication on his website – these are.
  3. Edit 18/2/18: As the rtf versions’ minute is consistent with the W32 minute to achieve this he’d have to:
    1. save the doc,
    2. open it in a hex editor,
    3. calculate the Win32FileTime,
    4. and carefully change the rtf minute number in a plain text editor. (Opening in a word-processor would change the time, and he’d have to start over).
    5. For each document.
    6. It’s doable, but from the file upload times, he’d have to do each of them in just a few minutes. A crazy amount of effort for very little reason.

On the flipside to that:

  1. He may be a sad nerd that suspected that some other sad nerd would spend days searching for a Win32Filetime.

Guess where the GMT+3 timezone on 15th June 2016 is… ?:

romania 2018-02-16 12-34-48

Sanity Check: Repeating the exercise with 2.doc:

The filetime is listed as 50 8F 40 A9 F6 C6 D1 01 (search here) and as before is little-endian, which we convert to: A9 40 8F 50 : 01 D1 C6 F6. Running in Python we get:

PYTHONDOC22018-02-16 12-52-34

131104626874290000 (100-nanosecond intervals since January 1, 1601 UTC)

Running through epochconverter.com gives us:

epochdoc2 from 2018-02-16 12-57-38

11:11:27 GMT

Comparing with the rev-time in the metadata of 2.doc which is:

{\revtim\yr2016\mo6\dy15\hr14\min11}

14:11 – 11:11 GMT = GMT + 3

Romania was a busy place in June 2016….

ElectionLeaks 2018-02-16 13-07-41

electionleaks2 from 2018-02-16 13-08-30

23 thoughts on “Doc 1: Part 3 … Back to Romania!

  1. Interesting parsing. However, in fairness, St Petersburg and Moscow also GMT+3 in June 2016 (Russia did not use Daylight time in 2016) so result doesn’t distinguish between Romania and Russia.

    Like

  2. Very interesting 🙂
    Of course you can also set the timezone on the PC you are working on to Moscow time before you start creating documents.

    MSXML 5.0 ships with Microsoft Office 2003 Microsoft Office 2007. These documents were made in DOC (= essentially RTF) format, the originals where in DOCX format. Some (all?) of the originals will be linked to MSXML 6 (newer OS and/or later than Office 2007, such as Office 2010).
    I think the reason for using DOC format is that G2 only had a MS Word 2003 version.

    Liked by 1 person

Leave a comment