Introduction
Hi guys, let's explore a powerful automation for the archival of large libraries using PnP with CSOM PowerShell. This approach can be used for an automatic/semi-automatic/manual way, as per the business needs on a weekly/monthly/quarterly basis, as per the archival needs.
Pre-Requisites
Install all the necessary DLL files by just going to this
link >> Click Download >> Select the Latest File. Once the installation is done, you can see a few DLL files automatically reflected on your Local Path, like:
"C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\16\ISAPI\Microsoft.SharePoint.Client.dll"
"C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\16\ISAPI\Microsoft.SharePoint.Client.Runtime.dll"
Also, install the SharePointPnPPowerShellOnline.msi by going to this
link.
Maintain the same Columns, Data Types, Views both in Source Library and Target Library, mostly alldocuments.aspx
PowerShell Scripts Used
The library scan is performed with all Nested Folders and Files, whichever has the Archive Flag manually set to 'True' OR scanned if they are 1 year old/n number of days old as per your Archival Strategy[LibScan.ps1].
- #Load SharePoint CSOM Assemblies
- Add - Type - Path "C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\16\ISAPI\Microsoft.SharePoint.Client.dll"
- Add - Type - Path "C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\16\ISAPI\Microsoft.SharePoint.Client.Runtime.dll"
- #Config Parameters
- $SiteURL = "https://sampleharenet.sharepoint.com/sites/classictest"
- $ListName = "PnPCopytoLib"
- $CSVPath = "D:\LibraryDocumentsInventory.csv"
- #Get Credentials to connect
- $Cred = Get - Credential
- Try {
- #Setup the context
- $Ctx = New - Object Microsoft.SharePoint.Client.ClientContext($SiteURL)
- $Ctx.Credentials = New - Object Microsoft.SharePoint.Client.SharePointOnlineCredentials($Cred.UserName, $Cred.Password)
- #Get the Document Library
- $List = $Ctx.Web.Lists.GetByTitle($ListName)
- #Define CAML Query to Get All Files
- $Query = New - Object Microsoft.SharePoint.Client.CamlQuery
- $Query.ViewXml = "@<View Scope='RecursiveAll'> < Query > < Where > < And > < Eq > < FieldRef Name = 'FSObjType' / > < Value Type = 'Integer' > 0 < /Value></Eq > < Or > < Eq > < FieldRef Name = 'ArchivalFlag' / > < Value Type = 'Choice' > Yes < /Value></Eq > < Lt > < FieldRef Name = 'Created' / > < Value Type = 'DateTime'
- IncludeTimeValue = 'True' > " + (get-date).adddays(-365).ToString("
- yyyy - MM - ddTHH: mm: ssZ ") + " < /Value></Lt > < /Or> < /And> < /Where> < /Query> < /View>"
- #powershell sharepoint online list all documents
- $ListItems = $List.GetItems($Query)
- $Ctx.Load($ListItems)
- $Ctx.ExecuteQuery()
- $DataCollection = @()
- #Iterate through each document in the library
- ForEach($ListItem in $ListItems) {
- #Collect data
- $Data = New - Object PSObject - Property([Ordered] @ {
- FileName = $ListItem.FieldValues["FileLeafRef"]
- RelativeURL = $ListItem.FieldValues["FileRef"]
- CreatedBy = $ListItem.FieldValues["Created_x0020_By"]
- CreatedOn = $ListItem.FieldValues["Created"]
- ModifiedBy = $ListItem.FieldValues["Modified_x0020_By"]
- ModifiedOn = $ListItem.FieldValues["Modified"]
- FileSize = $ListItem.FieldValues["File_x0020_Size"]
- })
- $DataCollection += $Data
- }
- $DataCollection
- #Export Documents data to CSV
- $DataCollection | Export - Csv - Path $CSVPath - Force - NoTypeInformation
- Write - host - f Green "Documents Data Exported to CSV!"
- }
- Catch {
- write - host - f Red "Error:"
- $_.Exception.Message
- }
Output
We shall get all the List of Files to be Archived as per the above-highlighed conditions using a Where Clause at your Local Path: D:\LibraryDocumentsInventory.csv
It should contain the following columns:
- FileName
- RelativeURL
- CreatedBy
- CreatedOn
- ModifiedBy
- ModifiedOn
- FileSize
Copying all the Scanned Files with their Relative Folder Paths from the Source Library to the Archive Target Library[CopyFiles.ps1]:
- #Load SharePoint CSOM Assemblies
- CLS
- Add - Type - Path "C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\16\ISAPI\Microsoft.SharePoint.Client.dll"
- Add - Type - Path "C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\16\ISAPI\Microsoft.SharePoint.Client.Runtime.dll"
- #Function to Copy a File
- Function Copy - SPOFile([String] $SourceSiteURL, [String] $SourceFileURL, [String] $TargetFileURL) {
- Try {
- #Setup the context
- $Ctx = New - Object Microsoft.SharePoint.Client.ClientContext($SourceSiteURL)
- $Ctx.Credentials = New - Object Microsoft.SharePoint.Client.SharePointOnlineCredentials($Credentials.Username, $Credentials.Password)
- #Copy the File
- $MoveCopyOpt = New - Object Microsoft.SharePoint.Client.MoveCopyOptions
- $Overwrite = $True[Microsoft.SharePoint.Client.MoveCopyUtil]::CopyFile($Ctx, $SourceFileURL, $TargetFileURL, $Overwrite, $MoveCopyOpt)
- $Ctx.ExecuteQuery()
- Write - host - f Green $TargetFileURL " - File Copied Successfully!"
- }
- Catch {
- write - host - f Red "Error Copying the File!"
- $_.Exception.Message
- }
- }
- $RootSiteURL = "https://samplesharenet.sharepoint.com"
- $SourceSiteURL = "https://samplesharenet.sharepoint.com/sites/classictest"
- $TargetSiteURL = "https://samplesharenet.sharepoint.com/sites/testsitearchival"
- $SourceSitePath = "/sites/classictest/"
- $TargetSitePath = "/sites/testsitearchival/"
- $SourceDocLibURL = "/PnPCopytoLib"
- $TargetDocLibURL = "/FlowArchiveLib2"
- #$TargetDocLibURL = "/FlowArchiveLib2/April_3rdWeek"
- for aby Batch wise Archivals
- $Credentials = Get - Credential
- Connect - PnPOnline - Url $TargetSiteURL - Credentials $Credentials
- $CSVPath = "D:\LibraryDocumentsInventory.csv"
- Import - Csv $CSVPath | ForEach - Object {
- $SourceFileURL = $RootSiteURL + $_.RelativeURL
- $temp = ($_.RelativeURL).Replace($SourceDocLibURL, $TargetDocLibURL)
- $temp = ($temp).Replace($SourceSitePath, $TargetSitePath)
- $TargetFileURL = $RootSiteURL + $temp
- $temp = ($temp).Replace($TargetSitePath, "")
- $temp = ($temp).Replace("/" + $_.FileName, "")
- if ($TargetDocLibURL - ne "/" + $temp) {
- Resolve - PnPFolder - SiteRelativePath $temp
- }
- #Call the
- function to Copy the File
- Copy - SPOFile $SourceSiteURL $SourceFileURL $TargetFileURL
- }
Just give your inputs as per the above-highlighted areas.
Precautions:
Use Only Site Collection Admin/Global Admin Login details for Logging in while the script running is on progress.
Try to hard code with password-protected security string oriented Token Management on the above scripts for no End User manual inputting involvement.
Output
You will find all the listed files from that CSV report which have been selected for Archival created with Meta Data properties preserved on the Target Archive Library.
Creative Idea
You can merge both the above scripts for Automation using Flow/Azure Functions and make them run on a Weekly/Monthly scheduled basis that promotes automation without Manual Intervention.
You can apply the above-discussed process with Large Lists too that needs to be Archived.
Note
Since we have a buffer size limit of 100 MB for a Complete Flow-based Copying of Files we are using PnP + CSOM PowerShell to run evergreen unlimited calls no pricing/no size limitation/metadata preserving, etc.
Cheers!