Unlocking DSpace: Custom Solutions & Best Practices

The Apption Data Podcast - Episode 3

Looking to modernize your current repository software? DSpace is a robust and widely-used platform; however, overlooking certain simple roadblocks could quickly escalate your project costs.

DSpace has revolutionized the management and dissemination of digital assets, ranging from research papers to media files. Its versatile nature makes it a preferred choice for organizations worldwide, offering flexibility and scalability to accommodate diverse content types. With built-in tools and customizable features, DSpace empowers users to make their content accessible to stakeholders.

Steve Brush, Director of Development at Apption, dives into everything about DSpace, our use case with a government agency, lessons learned during implementation, best practices when creating a custom open-source repository, and what Apption can deliver for custom DSpace solutions.

What is DSpace?

DSpace is an open digital repository software designed to host a variety of digital assets, most commonly research papers, journal articles, and scientific publications, but can also be used for hosting photos and media assets. With its flexible nature, DSpace can accommodate a wide range of content types, making it a versatile solution for organizations seeking to manage and disseminate their digital collections. DSpace enables users to make content publicly accessible and share it with different stakeholders through its built-in tools and through customizations.

DSpace was publicly released in 2002 and is now a standard in the field, with over 2000 institutions worldwide using the software. Initially developed at MIT and HP Labs, DSpace is now managed by the open-source community, with major players guiding its evolution through new releases. The “out of the box” software is free to install, with a multitude of customizations that can be added to fit any organization.

Building a Custom DSpace Solution for a Government Agency

The implementation of a custom DSpace solution for a government agency presented an opportunity to modernize their existing infrastructure. The agency had been utilizing a legacy piece of software, implemented back in the 1980s, which had become outdated and lacked the modern features and capabilities required for operation. The agency sought to consolidate multiple repositories into a single, more streamlined location, enhancing accessibility and efficiency for their stakeholders.

To address these needs, our team at Apption was engaged to lead the project from a technical standpoint, collaborating with the agency's architecture and cloud teams. This involved navigating security requirements specific to the Government of Canada and customizing the DSpace software to meet the agency's unique needs. One key aspect of the customization involved ensuring seamless user authentication while adhering to security protocols. Our solution also included an enhancement to allow for GeoSpatial searching using a map widget and polygon data attributed to the publications in the repository.

Our team led a migration effort to transition data from the agency's existing repositories to the new platform. This involved migrating assets, metadata, and other content, ensuring a smooth transition without loss of data or functionality. The successful completion of the project resulted in the launch of a publicly accessible Government of Canada site, providing users with access to a wide range of digital assets from the agency.

Overall, the implementation of this custom DSpace solution was successful and leveraged the DSpace software to modernize and streamline repository infrastructure which can be used within government agencies, academic organizations, non-profits, and commercial organizations. By working closely with stakeholders and addressing specific requirements, we were able to deliver a custom, scalable solution that met the agency's needs while adhering to security standards.

Unexpected Challenges

We ran into some surprises while implementing this solution for the government agency. One of the initial challenges encountered was the complexity of setting up DSpace environments, even for basic configurations. Unlike a straightforward deployment, establishing a development environment required meticulous attention to detail and often took several days to complete. Additionally, deploying DSpace onto a cloud environment posed its own set of challenges, as existing Docker images provided by DSpace were not production-ready and required customization to align with requirements and security protocols.

Another hurdle concerned performance issues, particularly during the data migration phase. With approximately 150,000 publications to migrate, the time required for migration became a major concern. To address this, customizations were made to optimize DSpace's ingestion process for faster data migration. Additionally, the absence of native support for Azure within DSpace was unexpected, requiring the team to undertake custom development work to integrate Azure storage into the DSpace deployment, along with additional tweaks to accommodate Azure-specific requirements.

Surprisingly, the front-end accessibility of DSpace also posed challenges, as it did not fully align with the accessibility standards preferred by the Government of Canada. This required modifications to the front-end interface to ensure compliance with accessibility requirements, adding complexity to the project beyond typical front-end development tasks. Despite these surprises, the team successfully navigated these challenges, delivering a custom DSpace solution that met the agency's needs while adhering to security standards and accessibility guidelines.

Best Practices

When embarking on a custom DSpace project, several best practices emerge from past experiences. These best practices include:

  1. Meticulous attention to metadata schema design.

  2. Careful planning and mapping of metadata fields.

  3. Leveraging standardized schemas.

  4. Adhere to DSpace’s rules to minimize customization.

  5. Comprehensive mapping during migration.

  6. Utilize curation tasks and built-in features.

  7. Understand your client’s requirements.

 

Firstly, meticulous attention to metadata schema design is paramount. As DSpace primarily serves publications searchable through metadata, careful planning and mapping of metadata fields ensure efficient dissemination and utilization of your data. Leveraging standardized schemas like Dublin Core simplifies integration and maximizes the system's capabilities, such as utilizing built-in features like RSS feeds and OAI-PMH. Adhering to the rules DSpace has in place will minimize the need for extensive customizations post-deployment, streamlining the project's development and maintenance.

Secondly, comprehensive planning and mapping during the migration process are essential. DSpace's curation tasks offer scheduled and ad-hoc jobs to maintain data integrity, aiding in cleaning up and indexing data efficiently. Particularly in multilingual environments, where content must be accessible in multiple languages, leveraging curation tasks becomes crucial. These tasks enable automatic indexing and display of content in both official languages, simplifying user interactions and ensuring compliance with organizational language requirements. Additionally, careful consideration of language-specific content display rules enhances user experience and ensures regulatory compliance, especially in organizations with bilingual mandates. By adhering to these best practices, custom DSpace projects can be executed efficiently while meeting organizational requirements and user needs effectively.

Apption’s Custom Solutions

Apption offers comprehensive services for implementing custom DSpace solutions, guiding organizations through every stage of the process. From initial planning to deployment and beyond, we provide expertise and support to ensure seamless adoption. Our services include engaging with hosting teams, provisioning environments, and managing CI/CD and DevOps tasks to facilitate smooth deployment. DSpace is built on top of a relatively new RESTful API. We've leveraged this API for various system integrations across the organization, including data consumers as well as the primary method for adding new publications to the repository. Additionally, we handle all aspects of project management, navigating through various stages and gates while providing technical leadership.

With proficiency in both back-end Java and front-end Angular development, we cover all facets of DSpace implementation. We assist in structuring DSpace installations, setting up communities and collections, and mapping out metadata schemas. Our experience extends to integrating essential functionalities like DOI provisioning through custom modules, such as a custom module that connects with Crossref, enabling organizations to mint persistent identifiers effortlessly. Moreover, we specialize in configuring Apache Solr search engine for efficient data retrieval, ensuring stable and reliable system performance. For organizations seeking a tailored DSpace solution, our services offer a comprehensive and reliable option.

Overall, implementing DSpace software can modernize an organizations digital infrastructure from original legacy software. Depending on organizational requirements, this software may require customizations that bring along complexities and surprises throughout implementation. By adhering to best practices, addressing challenges proactively, and leveraging our expertise, organizations can successfully harness the power of DSpace to modernize their digital infrastructure. Apption can guide organizations through every stage of the DSpace implementation process, offering comprehensive services tailored to unique needs. With our assistance, organizations can embark on their DSpace journey confidently, unlocking the full potential of this powerful digital repository software.

Listen to Episode 3 of the Apption Data Podcast - Unlocking DSpace: Custom Solutions & Best Practices below where Steve Brush, Director of Development, dives into everything about DSpace, including his experience, implementation surprises, and best practices.

Previous
Previous

Data Classification, Discovery, and Security

Next
Next

Clean Architecture: Saving You from Future Headaches