.NET – Access an XML file from inside a ZIP file

Java has a lovely feature where you can feed it an xml file using the url zip:c:\program\data\info.zip!folder\MyData.xml and it will open the file info.zip and use the file MyData.xml as the XML data file.

We have added this functionality to C#’s XmlUrlResolver.

Zip-directory It’s actually pretty simple to do this. The class XmlZipResolver inherits from XmlUrlResolver. So where you before created an XmlUrlResolver object to access an XML file, you instead create an XmlZipResolver object and you can then treat it as you would an XmlUrlResolver object. And this works for any url that XmlUrlResolver will handle using the additional code only if the url starts with zip: or jar: (a Java jar file is a zip file).

The key part is on the call to GetEntity where it will open the zip file and then get a stream to the requested xml file in the zip file. This code uses SharpZipLib for all zip file access.

public override object GetEntity(Uri absoluteUri, string role, Type ofObjectToReturn)
{

// Test whether the URL starts with “jar:” or “zip:”
String uriString = absoluteUri.ToString();
Match zipMatch = Regex.Match(uriString, zipRegex);

// If not a zip uri, pass the URL to XmlUrlResolver
if (!zipMatch.Success)
return base.GetEntity(absoluteUri, role, ofObjectToReturn);

// Strip out leading “zip:” or “jar:”
String stripedUriString = Regex.Replace(uriString, zipRegex, “”);

// Split into two strings, before and after the “!”
Match uriMatch = Regex.Match(stripedUriString, uriRegex);
if (!uriMatch.Success)
throw new UriFormatException(“Zip URI does not have a ‘!’ between the zip file path and ” +
“the path to the xml within the zip file, or path to xml does not ” +
“start with a ‘/’. Zip URI found was ‘” + uriString + “‘”);

String zipFilePath = uriMatch.Groups[1].ToString();
String xmlInZipPath = uriMatch.Groups[2].ToString();
if (xmlInZipPath.StartsWith(“/”) || xmlInZipPath.StartsWith(“\\”))
xmlInZipPath = xmlInZipPath.Substring(1);

// Use the first string as a ZIP file, use the second as a ZIP path to a file
using (WebClient reader = new WebClient())
{
// need to do this for the self: case in particular
Stream zipFileStream = null;
ZipFile zipFile = null;
try
{
if (zipFilePath.ToLower().StartsWith(“file:”))
zipFileStream = new FileStream(zipFilePath.Substring(5), FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
else
zipFileStream = reader.OpenRead(zipFilePath);
zipFile = new ZipFile(zipFileStream);
ZipEntry entry = zipFile.GetEntry(xmlInZipPath);
if (entry == null)
throw new FileNotFoundException(
String.Format(“could not find the xml file {0} in the zip file {1}”, xmlInZipPath, zipFilePath), xmlInZipPath);

// return the stream to that entry.”
return new StreamPair(zipFile, zipFileStream, zipFile.GetInputStream(entry));
}
catch (Exception)
{
if (zipFileStream != null)
zipFileStream.Close();
if (zipFile != null)
zipFile.Close();
throw;
}
}
}

After this everything is pretty straightforward where all the calls to member functions return from the stream of the embedded xml file. Because GetEntity() returns an object, if it returns a base XmlUrlResolver object then the methods in this class are not called. Therefore all remaining member functions are written specifically for the case of a file in a zip.

The one other item of not is the Stream returned is an object that holds three objects, the ZipFile, the Stream that is the zip file, and the stream that is the zip entry. This returned object inherits from Stream. For every call except Close() it just passes that same call to the zip entry stream object. But on a Close (and therefore indirectly on a Dispose), it closes all three objects.

The source file is part of the Kailua OpenSource project (created by Windward Reports – awesome reporting software) and is available at The Kailua ADO.NET wrapper. This is based on a shorter blog entry at Useful XML .NET classes.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>