Documentation for the NSF File Format 
Use this IdeaSpace to post ideas about Domino Server.

: -7
: 9
: 16
: Domino Server / Data Storage and Management
: nsf, documentation, file format, pst, microsoft, open source, interoperability
: Troy Howard192 27 Oct 2009
: / Email
As a software developer working with Lotus Notes, I would like to formally request the release of documentation for the NSF file format. Lotus Notes has a huge install base, and your customers have invested a lot of trust in your product, storing vital information within Notes databases.
If it was documented, we could write our own tools to work directly against the data, removing some of the burden from the Lotus Notes development team to support enhancements to the published APIs.
One of your primary competitors -- Microsoft Outlook/Exchange -- has recently taken a bold step forward by releasing the documentation of their equivalent PST format. Much like Lotus Notes, and the NSF format, the only historical way of accessing data in a PST was via limited proprietary APIs.
Releasing the documentation for the PST format was a huge gesture of understanding from Microsoft and shows a lot of concern for their customer base, and those who service and support their customers. It's those kinds of display of concern that create customer loyalty and encourage users to continue using their products.
Please step up to the plate and show your customers that you care about them as a company. Please provide documentation for the structure of your data files.
For reference, here's the Microsoft announcement of their intention to release the PST format specification.
-- update --
Microsoft completed this and has release the PST format documentation.
[MS-PST]: Outlook Personal Folders File Format (.pst) Structure Specification
Mircosoft took another even more amazing step. They are writing and released a free and open source SDK for PST files. The person writing this has been the maintainer of the Outlook PST format since 2005. He's doing this on MS's payroll. 
I started working on a .NET port for that:

1) Rob Goudvis8695 (02 Nov 2009)
I cannot imagine that this is what you really need to develop your own functions. What you need is a good and complete description of the C API. I would not care how the data is actually stored in a NSF-file. I would like to know what functions I have at my disposal to interact. Besides, opening the internal format would compromise the security.
2) Mark Demicoli10736 (10 Jan 2010)
Oh contraire Rob! The more information you have the more likely you will make correct decisions.

I can give you a simple example. There are likely to be structures in NSF files that would work faster (just for example) if Notes were stored in a certain way etc.

So if I'm building a high volume web app, I would be empowered to think smarter.

More information is NEVER a handicap.
3) Mark Demicoli10736 (10 Jan 2010)
Why all the Demotes without comment? Looks to me like people reacting to an 'outsider' without thinking.
4) Philip Storry1467 (11 Feb 2010)
OK, I'll say why I'm demoting it...

The Idea says that the only way to access a database is via the COM API. That's not true.

We also have the C API. If you want to hit a database at a low level, that's the way to do it. The COM API apes the LotusScript classes and the Java classes, and that's on purpose. But the C API does no such thing, and is much more low-level.

APIs are better than documentation here. This is because an NSF file may not be all you need in order to get to the actual logical Notes database that the NSF is assumed to contain.

DAOS is a good example of a change which moves data outside of the NSF file. No amount of documentation can compensate for the fact that opening an NSF file which uses DAOS will not get you all the data - the file attachments will be missing.

IBM are making low rumblings about perhaps moving view indexes outside of the NSF file in future. And of course DB2 integration, whilst not being continued, shows that an NSF file may not be an NSF file in the future - IBM might offer us NSF-CouchDB in the same way that they've offered NSFDB2, for instance.

And worse, what if the database is encrypted? NSFs can be encrypted, of course. If you're going to try and open an NSF yourself but bypass the API, then you're going to have to reproduce the entire ACL/Document security model AND you're going to have to reproduce the cryptographic code so that you can encrypt/decrypt the NSF. Good luck with that!

To my mind, the evidence is clear, and it shows that the assumption you can open an NSF file and access the whole logical Notes database content is incorrect.

Therefore the API is the only safe way to handle NSFs.

If the C API doesn't do something you want it to, then by all means ask for additions to the C API.

But NSF documentation, despite being of incredible interest to me as a geek, is probably not a good thing. I can only see it leading to programs that, no matter how well written, have flawed assumptions and will therefore fail in too many cases.

Hence I'm voting this down.
5) Troy Howard192 (17 Jun 2010)
All -- I've updated the idea description to remove references to the COM API, and show the progress that MS has made on the PST format documentation.

I apologize upfront, for not being very careful about the details presented in my post here. My references to the COM API are somewhat out of context. Of course I'm aware of the C API. I have written extensive code using it for working with NSF data. However, there are still functions which are only available via the COM API automating the Notes application.

Regardless, the primary point of my request for documentation is that without documentation, users who have content stored in NSF data are forced to:
1. Have a license and installation of Lotus Notes on each machine, for the entire lifetime of that data.
2. Only use NSF data where it can be installed (lots of contexts where this is not true)
3. Only do the things the C API allows (granted, it's comprehensive)
4. Can't redistribute the DLLs, must install entire application
5. When faced with data which is "corrupt" must rely on black-box tools to recover, and they don't give details about the nature of the corruption in the data, or give options for partial recovery

To speak specifically to the posted comments:

* "opening the internal format would compromise the security."
- Security through obscurity is silly. Encrypted data is still safe, even if we know how it's stored. With out the key to decode the data, it doesn't matter how you've accessed the encoded data, you still won't be able to decode it.

* "If the C API doesn't do something you want it to, then by all means ask for additions to the C API."
- This is not a good process for anybody. Lotus would be forced to expend resources to publish the API, and ensure that the extension is generalized, backward compatible, and then have to support it. I might need to do something very specific, and it makes sense that I am both responsible for writing the code and supporting the code, and not making Lotus do my work for me.

Beyond that Lotus/IBM has already shown that they basically don't respond to these kinds of requests. There are numerous citable situations where the community has asked repeatedly for a feature to be made available, and have been told in no uncertain terms "we aren't going to do that". Generally the justification for that is "there's no way to do what you're asking and satisfy all of our customers in the process, so we're just not going to bother"... or ... "that's not important enough for us to waste effort on it". Not important to Lotus for their product strategy, but possibly very important to the end users asking for it.

- This is not really relevant. We're interested in content stored in the NSF, not all the possible ways of storing content elsewhere. If it's not in the file, it's not in the file... so what? In the end, it'd be great to have documentation for *all the places data is stored*, including NSF, and the mechanisms for linking it together. NSF by itself is a good start.

* Encryption / ACL / Security
- As I said above, security through obscurity is ineffective. Encryption is not that complicated to implement. Security and ACL are application level concerns, not data storage level concerns, so not really relevant.

* "I can only see it leading to programs that, no matter how well written, have flawed assumptions and will therefore fail in too many cases."
- Quality of third party code is up to the engineers writing it. Can't really comment or assume much there. There's lots of excellent software engineers that do not work for IBM or on the Lotus Notes team.

I can say that the upper level of quality will be directly related to the quality of documentation provided. There should be no need for assumptions, as the details should all be clearly and unambiguously explained in documentation.

6) John Foldager1155 (26 Nov 2010)
Would like a low level Java API to the NSF. Don't need the NSF file format... as describe above, if IBM makes changes then your application which needs an NSF in a certain way would fail.


Welcome to IdeaJam

You can run IdeaJam™ in your company. It's easy to install, setup and customize. Your employees, partners and customers will immediately see results.

Use IdeaJam to:

  • Collect ideas from employees
  • Solicit feedback and suggestions from employees and customers
  • Run innovation contests and competitions
  • Validate concepts
  • Use the power of "crowd-sourcing" to rank ideas and allow the best ideas to rise to the top

IdeaJam™ works with:

  • IBM Connections
  • IBM Lotus Quickr
  • Blogs and Wikis
  • Websphere Portal
  • Microsoft Sharepoint
  • and other applications.

IdeaJam has an extensive set of widgets and API's that allow you to extend and integrate IdeaJam™ with other applications.

Learn more about IdeaJam >>

IdeaJam developed by

Elguji Software Logo